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Abstract 

This paper shows how agents' choice in communicative action can be 
designed to mitigate the effect of their resource limits in the context of 
particular features of a collaborative planning task. 1 first motivate a 
number of hypotheses about effective language behavior based on a sta- 
tistical analysis of a corpus of natural collaborative planning dialogues. 
These hypotheses are then tested in a dialogue testbed whose design is 
motivated by the corpus analysis. Experiments in the testbed examine 
the interaction between (1) agents' resource limits in attentional capac- 
ity and inferential capacity; (2) agents' choice in communication; and (3) 
features of communicative tasks that affect task difficulty such as inferen- 
tial complexity, degree of belief coordination required, and tolerance for 
errors. The results show that good algorithms for communication must 
be defined relative to the agents' resource limits and the features of the 
task. Algorithms that are inefficient for inferentially simple, low coordi- 
nation or fault-tolerant tasks are effective when tasks require coordination 
or complex inferences, or are fault-intolerant. The results provide an ex- 
planation for the occurrence of utterances in human dialogues that, prima 
facie, appear inefficient, and provide the basis for the design of effective 
algorithms for communicative choice for resource limited agents. 
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1 Introduction 



Agents may engage in conversation for a range of reasons, e.g. to acquire infor- 
mation, to establish a contract, to make a plan, or to be social. At each point 
in a dialogue, agents must make communicative choices about what to say and 
how and when to say it. This paper focuses on agents' communicative choice 
in collaborative planning dialogues, dialogues whose purpose is to discuss and 
agree on a plan for future action, and potentially execute that plan. I will ar- 
gue that agents' choices in communicative action, their algorithms for language 
behavior, must be determined with respect to two relatively unexplored factors 
in models of collaborative planning dialogues: (1) agents' resource limits, such 
as limits in attentional and inferential capacity; and (2) features of collabora- 
tive planning tasks that affect task difficulty, such as inferential complexity, the 
degree of belief coordination required, and tolerance for errors. 

A primary dimension of communicative choice is the degree of explicitness. 
For example, consider a simple task of agent A and agent B trying to agree on 
a plan for furnishing a two room house. Imagine that A wants B to believe the 
proposition realized by 1 and believes that B can infer this from the propositions 
realized in 2:|^ 

(1) If we agree to put the green couch in the study, we will have a matched-pair 
of furniture in the study. 



(2) a. I propose that we put the green couch in the study. 

b. We intend to put the green chair in the study. 

c. Two furniture items of the same color in the same room achieve a 
matched pair. 

In naturally-occurring dialogues, A may produce utterances realizing the 



Walker, 19931 1 



propositions in 3 to 6, or other variations [sadock, 1978, Cohen, 1987, Webber and Joshi, 1982 



(3) A: I propose that we put the green couch in the study. 



(4) A: We intend to put the green chair in the study. I propose that we put 
the green couch in the study. 



(5) A: Two furniture items of the same color in the same room achieve a 
matched pair. We intend to put the green chair in the study. I propose 
that we put the green couch in the study. 



(6) A: I propose that we put the green couch in the study. That will get us a 
matched pair. 

^ These examples are from the domain of Design- World to be discussed in section Q and 
are abstractions from naturally occurring examples in which the propositions realized here are 
realized in a number of different ways. Here the focus is on the logical relationships between 
the contents of each proposition: 2a and 2b are minor premises and 2c is the major premise 
for the inference under discussion. 
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The communicative choices in 3 through 6 illustrate a general fact: for any 
communicative act, the same effect can be achieved with a range of acts at 
various levels of explicitness. This raises a key issue: On what basis does A 
choose among the more or less explicit versions of the proposal in 3 to 6? 

The single constraint that has been suggested elsewhere in the literature is 
the REDUNDANCY CONSTRAINT: A should uot Say information that B already 
knows, or that B could infer. The REDUNDANCY CONSTRAINT appears in the 
form of simple dictums such as 'Don't tell people what they already know', 
as Grice's Quantity Maxim 'do not make your contribution more informative 



than is required'] Grice, 1975| and as constraints on planning operators for the 



generation and recognition of communicative pla ns Allen and Perrault, 198C , 
Cawsey, 199X1 |Cohen, 19781 |Moorc and Paris, 199^ |Litman and Allen, 1990| . So, 



if we assume that B knows 2b and 2c, then the only possibility for what A can 
say is 3. 

The REDUNDANCY CONSTRAINT is bascd On the assumption that agent A 
should leave implicit any information she believes that B already knows or she 
believes that B could infer, in other words, that agent B can always 'fill in 
what is missing' by a combination of retrieving facts from memory and making 
inferences. In section ^, I will show that agents in natural dialogues consistently 
violate the redundancy constraint. I will argue that this should not be 
particularly surprising since the REDUNDANCY constraint is based on several 
simplifying assumptions: 

1. UNLIMITED working-memory ASSUMPTION: everything an agent knows 
is always available for reasoning; 

2. LOGICAL OMNISCIENCE ASSUMPTION: agents are logically omniscient; 

3. FEWEST UTTERANCES ASSUMPTION: utterance production is the only pro- 
cess that should be minimized; 

4. NO AUTONOMY ASSUMPTION: assertions and proposals by Agent A are 
accepted by default by Agent B. 

When agents are autonomous and resource-limited, these assumptions do 
not always hold, and the problem of communicative choice remains. 

The plan for the paper is as follows: section ^ motivates a number of hypothe- 
ses about the relationship of communicative choice, resource limits and task 
features using evidence from natural collaborative planning dialogues. These 
hypotheses are the basis of a model of collaborative planning presented in sec- 
tion ||. Then section ^ describes how the model is implemented in a testbed 
for collaborative planning dialogues called Design- World, which supports ex- 
periments on the interaction of agents' communicative choice, resource limits. 



and features of the task. At this point, in section 4.1, I review the steps of the 
method applied so far, and motivate the use of simulation as a method for test- 
ing the hypotheses. Section ^ presents the experimental results and discusses 
the extent to which the hypotheses were confirmed, and then section |6| discusses 
the theoretical implications of these results and the extent to which they can be 
generalized to other tasks, agent properties, and communication strategies. 



2 Communicative choice in Dialogue 

Naturally occurring collaborative planning dialogues are design, problem solv- 
ing, diagnostic or advice-giving dialogues [Cawsey et al., 1992, pohen, 1987 , Reichman, 1985 
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198e, Pollack, 1986, Clark and Schaefer, 1989 



Grosz, 1977| [Webber and Joshi, 19"82| |Finin 

Traum, 1991, Whittaker and Stenton, 1988t . In order to generate hypotheses 



about the relation of communicative choice to agent properties and task fea- 
tures, this section examines communicative choice in naturally occurring col- 
laborative planning dialogues. Most of the examples discussed below are ex- 
cerpts from a corpus of dialogues from a radio talk show for financial planning 
advice | Pollack et a/., 198^ but I will also draw on data from collaborative de- 



sign, collaborative construction, and computer support dialogues | Cohen, 1984, 
Walker and Whittaker, 1990| , [Whittaker and Stenton, 1988[ , [Whittaker et ai, 1993| . 



Dialogue, in general, is modeled as a process by which conversants add to 
what is assumed to be already mutually believed or intended. This set of as- 
sumed mutual beliefs and intentions is called the DISCOURSE model, or the 

In collaborative planning 



Webber, 1978, Stalnaker, 1978 



COMMON GROUND 

dialogues, the conversants are attempting to add mutual beliefs about the cur- 
rent state of the world and mutual beliefs and intentions about a plan for future 
action to the discourse model. It is obvious that the efficacy of the final plan 
and the efficiency of the planning process must be affected by agents' algorithms 
for communicative choice. 

However previous work has not systematically varied factors that affect com- 
municative choice, such as resource limits and task complexity. Furthermore, 
most previous work has been based on the redundancy constraint, and ap- 
parently, its concomitant simplifying assumptions (but see |Zukerman and Pearl, 198(; 
Zukerman and McConachy, 1993, Lenkc, 1994 ). 

To explore the relation of communicative choice to effective collaborative 
planning, the analysis of naturally occurring collaborative planning dialogues in 
this paper focuses on communicative acts that violate the redundancy con- 
straint. These acts are informationally redundant utterances, IRUs, 
defined as:^ 

Definition of Informational Redundancy 

An utterance Ui is informationally redundant in a discourse 
situation S 



1. if Ui expresses a proposition pi, and another utterance Uj that 
entails pi has already been said in S. 

2. if Ui expresses a proposition pi, and another utterance Uj that 
presupposes or implicates pi has already been said in S. 

A statistical analysis of the financial advice corpus showed that about 12 
% of the utterances are IRUs. As mentioned in section ^ this should not be 
particularly surprising since the definition of IRUs reflects several simplifying 
assumptions. For example, the definition reflects the LOGICAL omniscience 
assumption because it assumes that all the entailments of propositions uttered 
in a discourse and certain default inferences from propositions uttered in a 
discourse become part of the discourse model.^ The definition reflects the NO 
autonomy assumption because it assumes that merely saying an utterance 

^The corpus consists of 55 dialogues from 5 hours of live radio broadcast, where each 
dialogue ranged in length from 23 to 100 turns. 



The first part of t 



definition is a variation on Hirschberg's definition of 
redundant [Hirschberg, 1985 1 which is used in her theory of scalar implicature. T his view 



of information is also the basis of information theoretic work such as[Barwise, If 



"tPrpsuppnsitinns and 



Karttunen and Peters, 1979, Gazdar 



pliratiires are 



two types 



default interfaKps| Cricc, 1967 , 
19791, iLevinson, 1983|, tThomason, 1990a |. The corpus 



analysis tagged defaults separately from entailments but found no evidence for a functional 
difference (see Walker93c). 
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Uj that expresses a proposition p^ is sufficient for adding p^ to the discourse 
model. The fact that IRUs occur shows that the simphfying assumptions are 
not vahd. 

The distributional analysis suggests that there are at least 3 functional cat- 
egories of IRUs: 

Communicative Functions of IRUs: 



• Attitude: to provide evidence supporting beliefs about mutual under- 
standing and acceptance 

• Attention: to manipulate the locus of attention of the discourse partici- 
pants by making a proposition salient. 

• Consequence: to augment the evidence supporting beliefs that certain 
inferences are licensed 

IRUs have antecedents in the dialogue which are the utterances that orig- 
inally realized the content of the IRU either through direct assertion or by an 
inferential chain; in the definition above Uj is an antecedent for Ui. The 3 com- 
municative functions of IRUs were identified by correlations with distributional 
features based in part on relations between the IRU and its antecedent, such as 
textual distance, discourse structure relations, and logical relations. The dis- 
tributional analysis also analyzed utterance features such as the intonational 
realization of the IRU, the form of the IRU, and the relation of the IRU to 
adjacent utterances. 

Below, I will briefly give examples of each type of IRU.^ For each type I 
will explain how the four simplifying assumptions of previous dialogue models 
predict that the utterance is informationally redundant. Then we will consider 
hypothetical agent and task properties under which IRUs function as hypothe- 
sized above. 



2.1 Attitude IRUs 

Attitude IRUs provide evidence supporting beliefs about mutual understanding 
and acceptance by demonstrating the speaker's attitude to an assertion or pro- 
posal made by another agent in dialogue. An Attitude IRU, said with a falling 
intonation typical of a declarative utterance, is given in 7-27, where M repeats 
what H has asserted in 7-26. M and H have been discussing how M and her 
husband can handle funds invested in IRAs (Individual Retirement Accounts). 
In 7, and in the other naturally occurring examples below, the antecedents 
of the IRUs are italicized and the IRUs are in CAPS. 



(7) (24) H: That is correct. It could be moved around so that each of you 
have two thousand. 

(25) M: I see. 

(26) H: Without penalty. 

(27) M: WITHOUT PENALTY. 

(28) H: Right. 

^Each communicative function given above includes a number of subtypes that will not be 
represented by these examples. In addition, the hypothesis that IRUs are a rehearsal mecha- 
nism, i.e. agents repeat propositions as an aid to memory, is tested in every experiment by the 
model of Attention/ Working memory. The hypothesis that agents say IRUs be ca.use thev r.an - 
not think of anything else to say, (the DEAD AIR hypothesis), was considered in [ Walker, 1992| , 
but I as yet have found no evidence to support it. For example, other indications of hesitation 
or planning what to say, such as disfluencies and long pauses, are not associated with IRUs. 
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(29) M: And the fact that I have a, an account of my own from a couple 
of years ago, when I was working, doesn't affect this at all. 



The IRU in 27 provides direct evidence that M heard exactly what H said 
I plark and Schaefer, 1989 , Brennan, 1990[. Accord ing to arguments elaborated 
below and elsewhere [Walker, 1992. Walker, 1993t|, M's response indirectly pro- 
vides evidence that she accepts and therefore believes what H has asserted. 



header: iNFORM(speaker, hearer, proposition) 

precondition: KNOW (speaker, proposition) 

want-precondition: speaker want iNFORM(speaker, hearer, proposition) 

effect: know (hearer, proposition) 



Figure 1: Definition of the inform plan operator in Allen and Perrault, 1980 



The classification of 7-27 as an IRU follows from the no-AUTONOMY as- 
sumption. The no-autonomy assumption is usually characterized as an 
agent being "co-operative" or "helpful". For exa mple, the motivation for t he 
plan effect of the inform planning operator from [Allen and Perrault, 1980 in 



figure 1 is that the hearer is cooperative. In other words, a cooperative hearer al- 
ways accepts and therefore believes (or knows) what the speaker has previously 
asserted. But if the effect of the inform act always goes through, then there is 
no reason for M to choose to produce an Attitude IRU in 7-27, in response to 
H's inform in 7-26. 

In recent work, the plan effect shown in figure 1 is treated as a default 
[ loshi et al, 1986| , [Reiter, 1980| , perrault, 1990| , prosz and Sidner, 1990| . Per- 
rault 's Belief Transfer Rule handles inferring the default acceptance of asser- 
tions (inform acts), while Grosz and Sidner's Conversational Default Rule CDR2 
handles inferring the default acceptance of proposals [Grosz and Sidner, 199C, 



Perrault, 199C[. In both cases, the default inference of acceptance of an asser- 



tion of P or a proposal to achieve P depends on the cooperativity of the hearer 
and on whether or not the hearer previously believed or intended to achieve 
-iP. However, Attitude IRUs are produced in many situations, where there is 
no reason for the default not to go through. In advice giving dialogues, the 
hearer is cooperative and the hearer does not previously believe or intend to 
achieve -iP, yet Attitude IRUs are common when the caller asks a the talk show 
host a question and then repeats or paraphrases his response to the question 
with an Attitude IRU. 

Clark and Schaefer proposed that Attitude IRUs provide positive evidence of 



understanding [Clark and Schaefer, 1987, Clark and Schaefer, 1989, Brennan and Hultcen, 1995 



They allow for understanding to be implicitly conveyed, but say that the amount 
of explicit positive evidence should be 'sufficient for current purposes'. How- 
ever, Clark and Schaefer do not address the question of belief transfer since they 
do not distinguish between indicating understanding and indicating acceptance. 
Furthermore, they make no predictions about what features of current purposes 
require more or less positive evidence, and thus lead an agent to produce an 
Attitude IRU. 

Thus, neither the addition of defaults nor the positive evidence model makes 
any predictions about when an agent should produce an Attitude IRU, since the 
inference of acceptance goes through by default without the Attitude IRU. 

In order to explain the function of Attitude IRUs, the NO- autonomy as- 
sumption must be replaced with the assumption that hearers always either 
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explicitly or implicitly accept or reject each utterance act that is intended to 
change their beliefs or intentions. In section ^, these observations are incorpo- 
rated into ahe model of collaborative planning dialogue. Results from testing 
hypotheses related to the choice to produce Attitude IRUs are presented else- 
where JWalker, 199^ , [Walker, 199"3b| , [Walker, 199"4bt , and will not be discussed 



further in this paper. 
2.2 Consequence IRUs 

Consequence IRUs make inferences explicit. For example, consider 8-17: 

(8) (15) H: Oh no. I R A's were available as long as you are not a partie- 
ipant in an existing pension 

(16) j. Oh I see. Well I did work, / do work for a company that has a 
pension 

(17) H: ahh. THEN YOU'RE NOT ELIGIBLE FOR EIGHTY ONE 



In 8, 8-15 realizes a biconditional inference rule, 8-16 instantiates one of the 
premises of this rule, and 8-17 realizes an inference that follows from 8-15 and 
8-16, for the particular tax year of 1981, by the inference rule of modus toUens. 

The definition of 8-17 as an IRU follows from the LOGICAL omniscience 
ASSUMPTION. If all entailments of utterances are automatically added to the dis- 
course model then 8-17 should not occur. However it is well known that neither 



human nor artificial agents are logically omniscient [ Norman and Bobrow, 1975 , 

Johnson-Laird, 199l[^oldman, 198^ , [Konolige, 1986i [Hayes- Roth and Thorndyke, 197£ | 



Agents might not have enough time to make all the relevant inferences even when 
they know all the relevant inference rules [ Konolige, 1985| , especially since pro- 



ducing and interpreting speech in real time has heavy planning and processing 
requirements. Thus plausible hypotheses are that: 

• HYPOTH-Cl: Agents choose to produce Consequence IRUs to demon- 
strate that they made the inference that is made explicit. 

• HYP0TH-C2: Agents choose to produce Consequence IRUs to ensure 
that the other agent has access to inferrable information. 

These hypotheses are motivated by the fact that agents are not logically 
omniscient. In addition, in the case of hypothesis C2, agents may choose to 
produce Consequence IRUs to ensure that the other agents have access to in- 
ferrable information in a timely manner, even when, in principle, they believe 
the other agent is capable of making the inference. 

However, much of communicative efRciency relies on the fact that agents do 
make some inferences from what has been said. Thus plausible refinements of 
hypotheses CI and C2 are that: 

• HYP0TH-C3: The choice to produce a Consequence IRU is directly re- 
lated to a measure of 'how hard' the inference is. 

• HYP0TH-C4: The choice to produce a Consequence IRU is directly re- 
lated to a measure of 'how important' the inference is. 

• HYP0TH-C5: The choice to produce a Consequence IRU is directly re- 
lated to the degree to which the task requires agents to be coordinated on 
the inferences that they have made. 
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Confirmation of these hypotheses entails that the fewest utterances as- 
sumption does not hold whenever processing effort is relevant to achieving the 
conversational goals. 



2.3 Attention IRUs 

Attention IRUs manipulate the locus of attention of the discourse participants 
by making a proposition salient. Attention IRUs often realize facts that are 
inferentially related to the assertion or proposal that the speaker is making. For 
example, consider 9 said by agent A to agent B while walking to work: 

(9) a. Let's walk along Walnut St. 

b. IT'S SHORTER. 

Agent B already knew that the Walnut Street route was shorter, so, by the 
redundancy constraint, a should have simply said 9a. 

The classification of 9b as an IRU reflects the unlimited working memory 
assumption. If everything an agent knows is always available for reasoning, then 
agents should never make communicative choices to include utterances such as 
9b. However, it is well known that human agents have limited attention/working 
memory [Miller, 1956, Norman and Bobrow, 1975, Baddeley, 1986| , and resource 



bounded artificial agents with limited time to access memory also have limited 
attentional capacity. 

If we define salient propositions as those that are accessible to a resource 



limited agent at a particular point in time | Prince, 1981, Prince, 1992 , then a 
possible hypothesis is that A said 9b to provide B with a salient reason to accept 
A's proposal to walk along Walnut St. Similar observations apply to (10): 

(10) a. Clinton has to take a stand on abortion rights for poor women. 

b. HE'S THE PRESIDENT. 

Here (10b) is already known to the discourse participants, but saying it 
makes it salient. In order to account for 10b we must modify the specific hy- 
pothesis above to reflect the difference in utterance type between 9a and 10a. 
Utterance 9a is a proposal whereas 10a is an INFORM. In 10, A said 10b to 
provide B with a salient reason to accept A's assertion about Clinton's obliga- 
tions. Utterance 9b is a warrant for adopting A's proposal in 9a, and 10b is 
SUPPORT for belief in A's assertion.^ 

• HYPOTH-Al: Agents produce Attention IRUs to support the processes 
of deliberating beliefs and intentions. 

Hypothesis Al means that the production of Attention IRUs is a surface 
manifestation of the fact that agents' limited working memory limits the ac- 
cessibility of beliefs used as the basis of deliberation. The hypothesis that the 
function of Attention IRUs is to make information salient to support interpre- 
tation and reasoning is formulated in the DISCOURSE inference constraint: 

HYP0TH-A2: There is a DISCOURSE inference constraint whose 
effect is that inferences in dialogue are derived from propositions that 
are currently discourse salient (in working memory). 
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has been characterized as the inference of a 
Hiscniirsn rnla.tion [Mar in and Thompson, 1987 1, or the inference of the speaker's intention 



The relationshi p between these utterances 



Moore and Paris, 19931 Moser and Moore and Hobbs have a rgued that these two views are 



functionally equivalent [Moser and Moore, 1993, Hobbs, 1994 
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The DISCOURSE INFERENCE CONSTRAINT is quite general since the infer- 
ences that A intends B to make may be any inferences related to the dialogue 
such as logical deductions, commonsense defaults, inferring part of A's plan. 



or inferring relations such as warrant or SUPPORT | Webber and Joshi, 1982 



Moore and Paris, 1993, Walker, 1994a| 



A more complex example illustrating the relationship of limited working 
memory, inferential processing, and agents' communicative choice is dialogue 
11. The caller E has been telling H, the talk show host, how all her money is 
inveted and then poses a question in 10-3: 

(11) ( 3) E: and I was wondering - should I continue on with the 

certificates or 

( 4) H: Well it's difficult to tell because we're so far away from any of 
them. But I would suggest this - if all of these are 6 month certificates 
and I presume they are 
( 5) E: yes 

( 6) H: then I would like to see you start spreading some of that money 
around 

( 7) E: uh huh 

( 8) H: Now in addition, how old are you? 

(discussion about retirement investments consisting of 14 utterances) 
(21) E: uh huh and 

(22a) H: But as far as the certificates are concerned, 
(22b) I'D LIKE THEM SPREAD OUT A LITTLE BIT. 
(22c) THEY'RE ALL 6 MONTH CERTIFICATES. 

(23) E: yes 

(24) H: and I don't like putting all my eggs in one basket 



The utterances in 22b and 22c realize two propositions established as mutu- 
ally believed in utterances 4 to 7, thus they are IRUs. Utterance 8 initiates a 
subdialogue digression about retirement investments. Since the discussion about 
retirement investments consist of 14 utterances in which the information in 4 
to 7 is not discussed, a plausible hypothesis is that, at 22a, H believes that the 



information expressed in 4 to 7 is no longer salient Walker, 1995a |. However, 
H expects E to use this information to make two inferences: (1) that having 
all your investments in 6 month certificates is an instance of the negatively 
evaluated condition of having all your eggs in one basket; and (2) that this is 
a WARRANT for E to adopt the intention to spread the certificates out a little 
bit. Here, therefore, we see two types of inferences: a content-based inference, 
INSTANCE OF, in the first case and a deliberation-based inference, warrant, 
in the second. It appears that H produces IRUs to ensure that these inferences 
get made and that H is basing his communicative choice on the DISCOURSE 
inference constraint. 

In addition to the naturally occurring examples of Attention IRUs, another 
source of evidence for the DISCOURSE inference constraint is the distri- 
bution of IRUs that make inferences explicit such as the Consequence IRU in 
dialogue H, |^. Figure || contrasts the distribution of Consequence IRUs and 
paraphrases, which are two different ways in which an IRU can relate logically 
to the prior discourse.]^ Paraphrases are syntactic or semantic transformations 



of a single antecedent utterance jMcKeown, 1983, Joshi, 1964 . Inferences are 



'^The other categories are repetitions, making implicatures explicit and making presuppo- 
sitions explicit. 
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Consequence IRUs 


Paraphrase IRUs 


Antecedents Salient 
Antecedents Not Salient 


24 
8 


43 
39 



Figure 2: Distribution ol Consequence IRUs that make inferences expUcit, as 
compared with Paraphrases, according to whether their antecedents are cur- 
rently salient 

distinguished from paraphrases by requiring the application of a logical inference 
rule such as modus ponens. A key difference is that inferences have multiple 
antecedents while paraphrases do not. A priori we would not expect infer- 
ences and paraphrases, as two types of entailments, to distribute differently 
with respect to whether their antecedents are salient.^ However figure ^ shows 
that INFERENCES are more likely to have salient premises than paraphrases 
{x^ = 4.835, p < .05, df = 1). This distributional fact provides evidence for the 
DISCOURSE INFERENCE CONSTRAINT because whenever we have evidence that 
an inference has been made, the premises are likely to be salient. 

The data discussed above provide evidence for the DISCOURSE inference 
CONSTRAINT, however it is clear that the effect of the constraint is strongly 
determined by the limits on working memory. In particular, a corollary of the 
constraint is that inferential complexity can be directly related to the number 
of premises that must be simultaneously salient for the inference to be made. 
These hypotheses can be summarized as follows: 

• HYP0TH-A3: The choice to produce an Attention IRU is related to the 
degree of inferential complexity of a task as measured by the number of 
premises required to make task related inferences. 

• HYP0TH-A4: The choice to produce an Attention IRU is related to the 
degree to which an agent is resource limited in attentional capacity. 

Finally, it is obvious that various tasks can be characterized in terms of the 
degree of inferential complexity, and that observations about belief coordination 
similar to those made about Consequence IRUs also apply to Attention IRUs, 
giving hypothesis A5. 

• HYP0T11-A5: The choice to produce an Attention IRU is related to the 
degree to which the task requires agents to be coordinated on the infer- 
ences that they have made. 

In the next sections we will see how we can test these hypotheses. 

3 Modeling resource-limited collaborative plan- 
ning dialogues 

The naturally occurring examples discussed in the previous section gave rise to 
a number of hypotheses as to the situations in which communicative choices 

*For the corpus-analysis, salient utterances are those within the last two turns. This 
measure is not perfect but it is replicable. 
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ATTENTIONAVORKING MEMORY (AWM) 




Figure 3: The IRMA Agent Architecture for Resource-Bounded Agents with 
Limited Attention (awm) 

to include IRUs could either improve the efficacy of a collaborative plan or the 
efficiency of the dialogue by which that plan was constructed. In this section, 
I will specify the details of a model of collaborative planning that will be used 
as the basis of the dialogue simulation testbed in which the hypotheses can 
be tested. In thinking about models of collaborative planning, I have found it 
useful to consider models in terms of 6 features: 

1. agent architecture 

2. role of resource limits: whether the agents constructing the collaborative 
plan have limited resources, and thus whether there is an attempt to either 
maximize or minimize any aspect of resource consumption, and if so which 
aspects. 

3. the mutual belief model: whether the function of the dialogue is to es- 
tablish mutual beliefs, and whether the mutual belief model is binary or 
allows for defaults in mutual beliefs. 

4. utterance act types: types of acts available for agents to communicate 

with other agents and the effects of each act on the cognitive state of the 
agents and the collaborative planning process. 
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5. mixed-initiative: whether one agent is the initiator or both agents have 
equal initiative.^ 

6. plan evaluation: how the collaborative plan is evaluated and what factors 
determine how good the collaborative plan is. 

Most accounts of collaborative planning dialogues are not specific about 
all of these features, although some accounts provide rich models of particular 
features. For example, Smith and Guinn provide a richer model of mixed- 



initiative than that provided here [Guinn, 1994, Smith et ai, 1992 1, and there 
are precise models of how hearers infer the uttera nce act type or the inten- 
tion underlying a particular communi cative action [ Allen, 1983 , Sidner, 1985 , 
Litman and Allen, 1990 , Traum, 1994| . However, to my knowledge no previous 



work has included a specification of the agent architecture, the relationship of 
the architecture to language behavior, the role of resource limits, and the plan 
evaluation process. The remainder of this section provides a specification for 
each of these features. 



3.1 Agent Architecture, Mutual Belief and Resource Lim- 
its 

Both the agent architecture and the role of resource limits are addressed by 
adopting an agent architecture b ased on the IRMA architecture for resource- 
bounded agents, shown in figure [s] |Bratman et ai, 1988 , Pollack and Ringuette, 1990 



The IRMA architecture has not previously been used to model the behavior of 
agents in dialogue. The basic components of the modified IRMA architecture 
are: 



• Beliefs: a database of an agent's beliefs. This includes beliefs that an 
agent believes to be mutual to some degree. 

• Belief deliberation: decides what an agent wants to believe when there is 
conflicting evidence. 

• Intentions: a database of an agent's intentions. This includes intentions 
that an agent believes to be mutual to some degree. 

• Plan Library: what an agent knows about plans as recipes to achieve 
goals. 

• Means-end reasoner: reasons about how to fill in existing partial plans, 
proposing options that serve as subplans for the plans an agent has in 
mind. 

• Filtering Mechanism: checks options for compatibility with the agent's 
existing plans. Options deemed compatible are passed along to the delib- 
eration process.0 

• Desires: Agents may have different typ es of desires but here I assume that 
their only desire is to maximize utility[ Doyle, 1992 [. 

Walker and 



^This is also called 'control' [Whittakcr and Stcnton, 1985, |Smith, 1980 1 
Whittaker and Guinn argued that the distribution of control in 
determined by whether inform, 



natural dialogue is primarily 
tinn relevant to the task is distributed between the agents or 



1993 



primarily known by one agent | Walker an<^ Whittaker. 199f 

^"The filtering mechanism presented in | Bratman et al., 1988| and used in Tileworld is more 
complex than that presented here because that work explored the issue of when current 
intentions get over-ridden. 
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• Intention Deliberation: decides which of a set of options to pursue (by an 
evaluation based on desires such as maximizing utility). 

• Attention/ Working memory (awm): the limited attention module con- 
strains working memory and the retrieval of current beliefs and intentions 
that are used by the means-end reasoner. 

For the purpose of modeling dialogue, the architecture has been extended 
with a model of mutual belief that allows for different degrees of mutual belief. 
For the purpose of exploring the effects of resource-bounds on attention, this 
architecture has been extended with a model of limited attention. All of the 
modules are standard except for the AWM module described in detail below, and 
the mutual belief module which will be briefly described. 



Attention/v^rorking memory mo del The model of limited AWM is a cog- 
nitively based model adapted from [ Landauer, 1975|, which f its many empir- 
ical results on human memory and learn i ng | Hellyer, 1962 ,_ Landauer, 1969 , 

Collins and Quillian, 1969 , Sternberg, 1967 , Fulving, 1967 , Anderson and Bower, 1973 
Solomon, 199l ~ The motivation for using a cognitively based model of AWM is 



to model the behavior of agents in naturally occurring dialogues and to test a 
theory of collaborative communication with humans .p^ 

The key properties of the model are that (1) limits on AWM are a parameter 
of the model and can be varied to explore different limits [ Baddeley, 1986t; (2 ) 

items encountered more recently are more likely to be salient [ Landauer, 1975[|; 

(3) items encountered more frequently are more likely to be salient [Hintzmann and Block, 1971 1 

These recency and frequency effects are a key aspect of the AWM model for 
testing the hypothesized functions of IRUs. Below I will discuss a particular 
implementation of this model and its role in testing the hypotheses. 

AWM is modelled as a three dimensional space in which propositions acquired 
from perceiving the world are stored in chronological sequence according to 
the location of a moving memory pointer. The sequence of memory loci used 
for storage constitutes a random walk through memory with each loci a short 
distance from the prev ious one. If items are encount ered multiple times, they are 
stored multiple times [ Hintzmann and Block, 197l| . The fact that the sequence 
of storage locations is random means that the recency and frequency effects 
are stochastically determined. This means that when this model is used in 
simulation, the simulation produces different results each time. 

When an agent retrieves items from memory, search starts from the current 
pointer location and spreads out in a spherical fashion. The resource limited 
aspect of AWM follows from the fact that search is restricted to a particular 
search radius defined in Hamming distance. For example, if the current memory 
pointer loci is (0 0), the loci distance 1 away would be (0 1 0) (0 -1 0) (0 1) 
(0 -1) (-1 0) (1 0). The actual locations are calculated modulo the memory 
size. 

The limit on the search radius defines the subset of the belief and intentions 
database that is salient. In addition, the fact that the pointer moves has the 
effect that the salient subset is always changing. Effectively, as new facts are 
added, others are displaced and become no longer salient, so that the salient 
predicate is dynamic. 

The search radius limit defines the awm parameter that will be varied in 
the experiments in section |5| in order to test the effect of different resource 

Some of the features of the model hold for processors in general, such as the feature that 
items that have been discussed more recently are more likely to be accessible with little effort, 
and that incoming information can displace other information from working memory. 
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limitations. Experiments that Landauer performed showed that, for a task 
requiring remembering whether a word belonged to a list of words, the model 
can be parameterized so that an AWM of 7 reproduces the human performance 
results in [ Hellyer, 1962{ . Since no systematic tests have been performed for 



human performance on the collaborative planning tasks investigated below, the 
experiments are run at low, mid and high AWM settings. Human performance 
is assumed to fall somewhere in the middle of these ranges. 



Using AWM to implement the discourse inference constraint The model 
of Attention/ Working Memory (awm) provides a means of testing the hypoth- 
esized DISCOURSE INFERENCE CONSTRAINT introduced in scctiou 2J . Remem- 
ber that the discourse inference constraint states that inferences in discourse 
are derived from propositions that arc currently in working memory. The 
AWM model, as shown in figure ||, limits the beliefs accessible for means-end 
reasoning and deliberation to the subset of beliefs that are currently in AWM 
I Hayes-Roth and Thorndykc, 19"79||Joshi, 1978|, |Joshi et ai, 1984| [Norman and Bobrow, 1975| 



These beliefs are defined as being salient. 

This model contrasts with the standard view of inference, where if an agent 
believes P and believes P-^Q, then the agent believes Q. The discourse inference 
constraint provides a principled way of limiting inference in modeling humans 
by requiring the premises of P and P^Q to be salient. An axiomatization 
requires the predicate salient and infer ence rules as follows for each inference 
rule schema jWalker, 1993b| , |Hobbs, 199| : 



Inference under the Discourse Inference Constraint: 

Say(A,B,P) ^ Salient(B,P) 



Salient(B,P) A BEL(B,P) 
BEL(B,Q) 



A Salicnt(B,P^Q) A BEL(B,P^Q) 



The first inference rule states that whenever agent A says an utterance to B 
that realizes proposition P, that P becomes salient for B. The second rule states 
that whenever a proposition P and an inference rule, P— >Q, are both believed 
by B and salient for B, then B can use them to infer Q. The model of AWM must 
be consulted to determine when the salient predicate holds. 



Mutual belief model The model of mutual belief is based on Lewis's Shared 
Environment model of mutual belicf| Lcwis, 1969] , Clark and Marshall, 1981, Barwisc, 1988| 
extended to support different degrees of mutual belief by tagging beliefs with 
qualitative endorsements at the time that they are formed and stored in the 
beliefs database |]Cohen, 1985 , [Gardenfors, 1988 , Galliers, 1991a |. Different de- 
grees of mutual belief allow some actions to be left to inference and some infer- 
ences to be defaults. This makes it possible to distinguish between the explicit 
acceptance of a proposal and the acceptance of a proposal inferred in the ab- 
sence of evidence to the contrary. When agents are not logically omniscient, 
it is possible to distinguish between mutual beliefs about what has been mu- 
tually inferred and informatio n that has been discussed in the dialogue. See 
[ Walker, 19921 [Walker, 1993b| for more detail. 



3.2 Discourse Acts, Utterance Acts, and Mixed Initiative 

The overall structure of the discourse in collaborative planning dialogues is pri- 



marily determined by the task structure [Power, 1974, Grosz, 1977, Litman, 1985 
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Figure 4: Finite State Model of Discourse Actions 



Sibun, 1991 1 . Each subpart of the task consists of a dialogue segment in which 



agents negotiate what they should do for that part of the task. 

As discussed in section [2.1| , we wish to abandon the NO AUTONOMY AS- 
SUMPTION. The model should allow either agent to initiate the dialogue or ini- 



tiate a subdialogue about a new part of the task | Walker and Whittaker, 1990, 
Dahlback, 1991 1. For each agent to be able to do this, knowledge about the 



task must be distributed between the participants so that each participant has 
a basis for means-end reasoning and deliberation. 

Furthermore, agents should be able to ACCEPT or reject one another's 
proposals. Each plan step contributing to a higher level goal must remain open 
for negotiation even if both agents are committed to coming up with a col- 
laborative plan for the higher level goal. This contrasts with models in which 
proposals for each plan substep that the initiator makes must be accepted by 
the non-initiator, once the non-initiator has agreed to work on a collaborative 
plan [ Cohen and Levesque, 1991 , [Grosz and Sidncr, 199C |. 



Discourse Acts and Mixed Initiative To engage in collaborative planning, 
agents take turns sending messages, and each turn may consist of one or more 
DISCOURSE ACTS. Discourse acts are opening, closing, proposal, accep- 
tance, REJECTION and clarification. Thesc are higher level acts that are 
composed of primitives called utterance acts, which will be described below. 

The schema of discourse actions shown in figure ^ controls the sequence of 
discourse acts and which discourse acts can be combined into a single turn.p^ 
The discourse act schema is the basis of an algorithm by which agents achieve 
a COLLABORATIVE-PLAN. For each step in the domain plan: 



^^This schema cannot descri 



Lcvinson, 197S| . 



3 e all discourse action transitions in every type of dialogue 



Levinson, 1981 



proposals to be simultaneously open|R,ose et at., 1995, 3idner, 1994 1 



Scli nglnff, 19S7 One reqnirpH eyte nsinn is to allow multiple 
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1. individual agents perform means-end reasoning about options in the do- 
main; 

2. individual agents deliberate about which options are preferable; 

3. then one agent initiates a subdialogue consisting minimally of a PROPOSAL 
to the other agent, based on the options identified in a reasoning cycle, 
about actions that contribute to the satisfaction of their goals; 

4. then the proposal may be subject to clarification, after which it is 
either ACCEPTED or rejected by the other agent, by calculating whether 
it maximizes utility 

This algorithm ties the discourse act schema in figure I to the IRMA ar- 
chitecture. The requirement that agents must indicate whether they accept or 
reject each p roposal follows from replacing the assumption of c ooperativity in 
earlier work [ Allen and Perrault, 1980 , Grosz and Sidner, 1990 with the col- 
laborative PRINCIPLE: 



collaborative principle: Conversants must provide evidence of 
a detected discrepancy in belief as soon as possible. 



The COLLABORATIVE PRINCIPLE was proposed in [ [Walker, 1992[ , and is an 
abstraction of the COLLABORATIVE PLANNING PRINCIPLES of Whittaker and 
Stenton (1988) and Walker and Whittaker (1990). The COLLABORATIVE prin- 
ciple means that speakers must monitor the next action by the hearer in order 
to detect the effects of their utterances. If the hearer continues the dialogue 
and provides no evidence of a belief discrepancy, the inference of acceptance is 
licensed as a default palliers, 199^ , [Walker, 1992t . 

The fact that agents evaluate both assertions and proposals before deciding 
what to believe or intend follows from the IRMA agent architecture in figure ^. 
As the figure shows, incoming messages about intentions and beliefs are subject 
to intention or belief deliberation. This provides the basis for abandoning the NO 
AUTONOMY ASSUMPTION while Specifying why an agent would accept or reject 



another agent's proposal (see also [Galliers, 1989, Galliers, 1991b, Doyle, 1992|). 



Agents evaluate assertions and proposals from other agents by assessing the 
support for assertions and the warrants for proposals. Finally, as figure ^ shows, 
this evaluation takes place under constraints of limited working memory, since 
the beliefs that can serve as supports or warrants must be salient for these 
processes to use them. 



Utterance Acts Figure ^ shows the discourse act schema that provides the 
basis for dialogue. Discourse acts are composed of utterance acts, which are the 
primitive acts that an agent can actually perform. Each discourse act can be 
performed in different ways by varying the number and type of utterance acts 
that it consists of. For example, a proposal may or may not include additional 
information that can convince the hearer, as in example pb[ . 

There are seven utterance act types: open, close, propose, accept, 
REJECT, ASK and SAY, which are realized via the schemas below: 

(propose ?speaker ?hearer ?option) 
(Ask ?speaker ?hearer ?belief) 
(say ?speaker ?hearer ?belief) 
(Accept ?speaker ?hearer ?option) 
(Reject ?speaker ?hearer ?belief) 
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(Reject ?speaker Yhearer ?option) 
(Open ?speaker ?hearer ?option) 
(Close ?speaker ?hearer ?intention) 



The content of each utterance act can be an options and intentions repre- 
senting a domain plan act constructed by means-end reasoning and deliberation 
as shown in figure ||. An option is an act that has not been committed to 
by both agents. An intention is an act that has been committed to by both 
agents [ Pollack and Ringuette, 199"c| , Bratman et al, 1988 1. An option only be- 



comes an intention in the collaborative plan if it is accepted by both agents, 
either explicitly or implicitly. The option in a reject schema is a counter- 
proposal and what is rejected is the current proposal. 

The content of an utterance act may also be a belief. These beliefs are either 
those that an agent starts with, beliefs communicated by the other agent, or in- 
ferences made by the agent during the conversation. Beliefs in ASK actions have 
variables that the addressee attempts to instantiate. The belief in a rejection 
schema is a belief that the speaker believes is a reason to reject the proposal, 
such as a belief that the preconditions for the option in the proposal do not hold 



[ Walker and Whittaker, 1990[ . 

Examples of these utterance acts in dialogue will be given in section H Below 
how B processes each of the 7 messages that A can send is specified. In the effects 
specified below, Store means store in AWM, for eventual long-term storage in the 
beliefs database. The processing involved with each incoming message should 
be understood with reference to the IRMA agent architecture. 

1. Agent A: (propose ?speaker ?hearer ?option) 
Agent B: 

(a) Filter: Check whether ?option is compatible with current beliefs, e.g. 
that no current beliefs contradict its preconditions. 

(b) Infer and Store the preconditions of ?option 

(c) Means-End Reason (ME-Reason) about Intention the ?option con- 
tributes to. 

(d) Deliberate by evaluating the ?option against other options generated 
by Means-End reasoning. 

(e) Indicate results of deliberation by an Accept or Reject. 

2. Agent A: (Ask ?speaker ?hearer ?belief); 

Agent B: retrieve beliefs matching ?belief from Memory and respond 
with (say ?speaker ?hearer ?belief ) with the variable instantiated for each 
matching Belief. 

3. Agent A: (say ?speaker ?hearer ?belief); 
Agent B: Store ?behef 

4. Agent A: (Accept ?speaker ?hearer ?option)p^ 
Agent B: 

(a) Store (intend A B ?option)[] 



^^This is a simplification since the form of the acceptance determines the endorsement type 
on the mutual belief that is added to the beliefs database. 

'^^This represents that both agents are committed to the option while the binding of the 
?option specifies the agent who will execute the option. 
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(b) Store (Act-EfFects ?option) 



5. Agent A: (Reject ?speaker ?hearer ?option) 
Agent B: 

(a) Deliberate ?option in comparison with own current proposal that was 
rejected. 

(b) Accept ?option if better than your current proposal. 

(c) If rejecting ?option then reject with reason for rejection. 

6. Agent A: (Reject ?speaker ?hearer ?belief); 
Agent B: Store ?belief 

7. Agent A: (Open ?speaker ?hearer ?option); 

Agent B: Mark the discourse segment that matches ?option as open 

8. Agent A: (Close ?speaker ?hearer ?intention); 
Agent B: Close the discourse segment for ?intention 

These acts and their effects determine the structure of the dialogue and its 
effect on the mental state of the conversants. 



3.3 Plan Evaluation 

There are three components of the plan evaluation process that have different 
effects on collaborative planning. Two features are related to the task definition 
and the third to the model of evaluation applied: 

1. the degree of belief coordination: whether some or all of the intentions 
associated with a plan must be mutually intended and whether any beliefs 
related to the intended acts must also be mutually believed; 

2. task determinacy and fault tolerance: whether the task has only one so- 
lution, or is fault tolerant or more or less satisfiable. 

3. the model must specify what is to be optimized and whose resource con- 
sumption is to be minimized for performance evaluation. 

Different theories of collaborative planning reflect different views of the de- 
gree of belief coordination required for agents have a collaborative plan. The 
minimal approach is to not require the agents to establish mutual beliefs at all 



[Durfee et at, 1994, Guinn, 1994|. Rather agents divide up the plan into sub- 
components and separately plan each component, without requiring agreement 
on how the subcomponents are planned. At the next level of belief coordina- 
tion, it is common to require the intended acts of the collaborative plan to be 



mutually intended [ Grosz and Sidncr, 199C , Traum, 1994 , Levesque et ai, 199C 



Thomason, 1990b |. At the highest level of belief coordination, the agents must 
both mutually intend all intentions and mutually believe any beliefs that sup- 
port the plan such as the warrant beliefs that provide reasons for adopting a 
step of the plan. In addition, it is possible to require that inferences about other 
goals that the intended actions will satisfy should also be mutually believed. 

In this work, the assumption is that the degree of belief coordination required 
is a feature of the task. The minimal level in the experiments discussed below 
will be that all intentions must be mutually intended, and the experiments will 
vary whether warrants, and inferred intentions that are derived from explicitly 
discussed intentions. 
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Task determinacy and fault tolerance have an effect on communicative choice 
in collaborative planning because they are directly related to how much uncer- 
tainty is tolerable in the plan. If a partial plan has some utility, then making 
a mistake or only constructing a partial plan is not catastrophic. For task 
determinacy, I assume that a measure of the quality of the final plan can be 
determined from the utility of each step in the plan, and that partial plans can 
also be evaluated, so that the task is more or less satisfiable. 

With respect to evaluating performance, I assume that the agents in a col- 
laborative planning dialogue are working as a team, and as a team they at- 
tempt to optimize the team's performance and minimize the team's consump- 
tion of resources. This follows from Clark's assumption that conversants in 
dialogu e attempt to achieve their dialogue purpose with least collaborative 



Clark and Schaefer, 1989| , |Clark and Wilkes-Gibbs, 1986| , |Clark and Brcnnan, 1990 | 



EFFORT 

This approach contrasts with other approaches in which agents only participate 
i n communication t o the degree that it maximizes their own expected utility 
i Durfee et a/., 1994| . 

A final choice has to do with which processes collaborative effort consists 
of. A common assumption is that the number of utterances is the primary effi- 
ciency measure [ Grice, 1975 , Chapanis, 1975 1; this is the fewest utterances 
ASSUMPTION. Since all types of IRUs violate this assumption, in this work col- 
laborative effort is defined with reference to the agent architecture and to all the 
processes required in collaborative planning, i.e. (1) retrieval processes neces- 
sary to access previously stored beliefs in memory; (2) communicative processes 
related to generating and interpreting utterances; and (3) reasoning processes 
that operate on beliefs stored in memory and those communicated by other 
agents. With respect to the IRMA architecture (figure ||), retrieval processes are 
those that access AWM, the plan library and the beliefs and intentions databases, 
communicative processes are the modules for perception and generation of mes- 
sages, and inferences are the combined processes of deliberation, means-ends 
reasoning, and filtering. Collaborative effort includes the costs for both agents 
for all of these processes: 



COLLABORATIVE EFFORT — 

(the total cost of communication for both agents) 
+ (the total cost of inferences for both agents) 
+ (the total cost of retrievals for both agents) 

Collaborative effort is defined for the whole dialogue and not on a per utter- 
ance basis. This definition and the other assumptions support the specification 
of the plan evaluation process. Given the above definitions, performance is 
the difference between a measure of the quality of the problem solution and 

COLLABORATIVE EFFORT. 



PERFORMANCE — QUALITY OF SOLUTION ~ COLLABORATIVE EF- 
FORT 



Since the agents' desires are simply to maximize utility, the quality of the 
solution is measured by the utility of the resulting plan with respect to the 
agents' utility functions. 



4 Design- World 
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4.1 Methodological Basis of Design- World 

Design- World is a testbed for a theory of collaborative comniunication, which 
instantiates the model of collaborative planning in dialogue discussed in section 
^. In order to motivate the use of the Design- World testbed in developing and 
testing a defeasible theory of communication in collaborative planning, this sec- 
tion first describes Design- World as an instance of a general method, and then 
describes the testbed and its implementation as well as the task and communi- 
cation parameters. The method can be characterized by the steps below: 

1. Generate hypotheses about the features of a model of collaborative plan- 
ning dialogues from a statistical analysis of human-human dialogue cor- 
pora. 

2. Produce a functional characterization of the model, specifically including 
the parameters that could affect task outcome, or claims about the efficacy 
of the model. 

3. Implement the model as a testbed so that (some of ) these parameters 
can be controlled, while using independently motivated modules for other 
aspects of testbed. 

4. Test the hypotheses and the resulting model against different situations 
controlled by parameter settings. 

The hypotheses that were generated by the statistical analysis of the dialogue 
corpora were discussed in section |^. These hypotheses are roughly that under 
constraints of resource bounds, task inferential complexity, task fault tolerance, 
and task requirements for belief coordination, communicative choices to include 
IRUs can reduce collaborative effort or increase the quality of solution of the 
collaborative plan. 

The next step is to produce a functional characterization of the model (a 
partial formalization) . In section ^, I discussed features of a model of collab- 
orative planning that interact with an agent's autonomy, resource limits and 
communicative choices. In nonexperimental work, the model is the final result 
of the research. However, this leaves the model and the claims that motivated 
the model empirically unverified. In formal characterizations, many simplifying 
assumptions need to be made, and it is not always clear that the results carry 
over to complex domains where the simplifying assumptions do not hold. While 
the model presented here is empirically based on statistical analysis of a corpus 
of naturally occurring dialogues, many of the hypotheses discussed above are 
related to models of agents' processing. Corpus analysis can only provide weak 
support for these hypotheses. Thus another source of empirical verification is 
desirable in order to develop a well-specified and defeasible theory. This is the 
motivation for the Design- World testbed. 

Next, it is necessary to consider the parameters that could affect the outcome 
or claims about the efficacy of the model and then implement the model as a 
testbed so that at least some of these parameters can be controlled. The use 
of independently motivated modules for other aspects of the testbed guarantees 
that the testbed actually tests something, and also makes it less likely that the 



testbed is a case of 'experimentation in the small' [ Hanks et al, 1993 |. 

Parameters that affect the efficacious use of IRUs have already been dis- 
cussed: these include resource bounds, task inferential complexity, and require- 
ments for belief coordination. The AWM model introduces a parameter for re- 
source bounds, and is implemented as part of the IRMA architecture for re- 



source limited agents, which is independently motivated [Bratman et ai, 1988 
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Pollack and Ringuette, 1990[ . The AWM model itself is also independently mo- 
tivated, since it reproduces in simulation many well known results on human 
memory and learning jLandauer, 1975; Solomon, 1992|. In addition, the utter- 
ance acts and their effects are independently motivated by other research on 
collaborative planning dialogues and by the statistical analysis of dialogue cor- 
pora IWhittaker and Stenton, 198^, [Walker and Whittaker, 1990|, JSidncr, 1994 



Stein and Thiol, 1993[ [Reithingcr and Maier, 1995| , parletta, 1992| , phu-Carrol and Carberry, 1995| , 
Traum, 199f ^ 

To introduce parameters related to tasks, the testbed is designed around 
a simple task where two agents must form a collaborative plan as to how to 
arrange some furniture in the rooms of a two room house. The task is based on 
cooperative design tasks used for experiment s on distributed coope rative work 
for which a corpus of dialogues was available [ Whittaker et ai, 1993 ], However, 
the simple task can be varied along three dimensions: (1) inferential complexity; 
(2) degree of belief coordination required; (3) tolerance for errors and usefulness 
of partial solutions. 

These three task dimensions represent very different tasks. For example, 
varying the task by increasing inferential complexity provides information on the 
performance of agent communication algorithms in simple versus inferentially 
complex tasks. The dimensions enable us to generalize from the specific task 
in Design- World to real world tasks. Section 4.2 will introduce the Standard 
version of the task, and then Section 4.4 will describe the task variations. 



To introduce parameters related to the interaction of communicative choice 
with task complexity and resource limits, agents are designed so that they vary 
their communication strategies to include or not include IRUs. Section 4.5 



will describe the communicative choice parameters that will be used in the 
experimental results presented in section S. 

The experiments reported in section pTwill examine the interaction of three 
factors: (1) resource limits; (2) communicative strategies; and (3) task defini- 
tion. The experiments in the testbed have several functions: (1) they demon- 
strate that the model can be implemented; (2) they highlight potential flaws 
in the model; and (3) they provide empirical verification of hypotheses about 
the function of particular communicative strategies beyond that provided by 
corpus analysis and researcher's intuitions. This section describes the domain, 
the implementation of the collaborative planning model in this domain, the 
communicative strategies and the task variations. 



4.2 Design World Collaborative Planning Domain 

In Design- World, two artificial parametrizable agents converse in order to agree 
on the design of the floor plan of a two room house [Whittaker et ai, 1993, 
Walker, 19931 1. The design-house plan requires the agents to agree on how 
to DESiGN-ROOM-1 and DESiGN-ROOM-2. At the beginning of the simulation, 
both agents know the structure of the design-house plan and that it requires 
four furniture items in each room. Each agent has 12 items of furniture that 
can be used in the plan. A potential final collaborative design plan negotiated 
via a (simulated) dialogue is shown in Figure |^. 

Each furniture item has a value that is used both as a warrant to support 
deliberation (the utility of an act that is under consideration), and as the basis 
for calculating the quality of solution of the final plan, as discussed in 
section |^. The values on the furniture items range from 10 to 56, and both 
agents' furniture items range over these values. Since beliefs about these values 
will be used to test hypotheses about the function of IRUs, both agents know 
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DESIGN WORLD COLLABORATIVE PLAN: 434 points 



Figure 5: One Final State for Design-World Standard Task: Represents the 
Collaborative Plan Achieved by the Dialogue, 434 points 



the values of all the furniture items at the beginning of the dialogue. 

Design- World agents negotiate a collaborative plan following the collabora- 
tive planning algorithm discussed in section ^ Each agent carries out means-end 
reasoning about the furniture items that can be used in the design. Means-end 
reasoning generates options - these options are the content of PROPOSALS to 
the other agent to put a furniture item into a room. This is illustrated by 
Dialogue 12, which includes the artificial language that the agents communicate 
with and a gloss automatically generated from that language in italics for part 
of designing room-1. 

(12) 1: BILL: First, put the green rug in the study. 

(propose agent-bill agent-kim option-10: put-act (agent-bill green rug 
room-1)) 

2: KIM: Then, let's put the green lamp in the study. 

(propose agent-kim agent-bill option-33: put-act (agent-kim green lamp 
room-1)) 

3: BILL: Then, let's put the green couch in the study. 

(propose agent-bill agent-kim option-45: put-act (agent-bill green couch 
room-1)) 

4: KIM: No, instead let's put the purple couch in the study. 

(reject agent-kim agent-bill option-56: put-act (agent-kim purple couch 
room-1)) 

5: BILL: Putting in the green couch is worth 56 

(say agent-bill agent-kim bel-265: score (option-45: put-act (agent-bill 
green couch room-1) 56)) 

6: BILL: It is better to put the green couch in the study. 

(reject agent-bill agent-kim option-56: put-act (agent-bill green couch 
room-1)) 

At the beginning of the dialogue, Agent-Kim has stored in memory the 
proposition that (score green-rug 56). When she receives Bill's proposal as 
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shown in (12-1), she evaluates that proposal in order to decide whether to accept 
or reject it. As part of evaluating the proposal she will attempt to retrieve the 
score proposition stored earlier in memory. Thus the propositions about the 
scores of furniture items are warrants for supporting deliberation. 

As discussed in section ^, the agents retain their autonomy even though 
the agents both want to agree on a plan for designing the house. Thus, on 
receiving a proposal, an agent deliberates whether to ACCEPT or reject it 



[Doyle, 1992, Walker, 1994b . Proposals 1 and 2 are inferred to be implic- 



itly ACCEPTED because they are not rejected [Whittakcr and Stcnton, 1988 



Walker and Whittaker, 1990 1. This follows from the collaborative prin- 
CIPLE discussed in section |^. If a proposal is accepted, either implicitly or ex- 
plicitly, then the option that was the content of the proposal becomes a mutual 



intention tha t contributes to the final design plan |Power, 1984 , Walker, 1992 



Sidner, 1994 | 



Agents REJECT a proposal if deliberation leads them to believe that they 
know of a better option, based on evaluating the utility of the competing op- 
tions they have generated by means-end reasoning. For example, in (12-4) Kim 
rejects the proposal in (12-3), for pursuing option-45, and proposes option-56 
instead. The form of the rejection as a counter-proposal is based on observations 
about how rejection is communicated in naturally-occurring dialogue as codified 



in the collaborative planning principles IWalker and Whittaker, 199C|] . 
When an agent intends to reject another agent's rejection, as in 12-5 and 6, the 
agent includes additional information to support its proposal. In 12-5, agent-bill 
reminds agent-kim of the value of the green couch, before rejecting agent-kim's 
proposal. 

4.3 Agent architecture implementation in Design- World 

The agent architecture used in the Design- World simulation environment is 
the modified IRMA ar chitecture, shown in figure |^ and discussed in section ^ 
[ Bratman et al, 1988| , Pollack and Ringuette, 199C|. The only aspects of the 



architecture that are specific to Design- World are the plan library, the way that 
AWM is implemented, and the way that belief deliberation is implemented. 

For the experiments below, the total size of AWM is set to 16, but memory 
is wrap-around, and there is no overwriting. If the path of the memory pointer 
retraces its steps so that the current memory loci already has something stored 
in it, the new item is simply added. Thus memory capacity is unbounded. 

Since hypothesis A4 relates to the degree to which AWM is limited, we want 
to be able to compare the performance of agents who are more or less attention 
limited. Thus, all experiments make comparisons between different commu- 
nicative strategies over three ranges of of AWM settings; the AWM search radius 
parameter varies from low AWM (radius of 3 and 4), to mid AWM (radius of 
6 and 7) to high AWM (radius of 11 and 16). low AWM agents are severely 
attention limited agents, wherease almost everything an agent knows is salient 
for HIGH AWM agents. 

The limits on AWM plays a critical role in determining agents' performance. 
Remember that only salient beliefs can be used in means-end reasoning and 
deliberation, so that if the warrant for a proposal is not salient, the agent 
cannot properly evaluate a proposal. However, if the agent only knows of one 
option, the agent can accept the proposal on the assumption that any option is 
better than doing nothing. Sectio n ^ will show the impact of resource limits on 



performance. For more detail see [Landauer, 1975, Walker, 1994b 



The implementation of belief deliberation, for the purpose of Design- World 
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was tied directly to Landauer's AWM model. In that model, nothing stored 
in memory is ever deleted or modified. Rather, new beliefs are added which 
effectively compete with beliefs that are already present. Thus an agent's belief 
deliberation process depends on collecting a set of related beliefs which may be 
contradictory, and applying an algorithm to determine what the agent believes 
(see |Galliers, 1990| , pardenfors, 1990[ ). The fact that new information about 
the the state of the world supercedes old information is an emergent property 
of the belief retrieval mechanism: beliefs recently added are more likely to be 
retrieved. However the stochastic aspect of retrieval means that it is possible 
for an agent to decide to believe "out-of-date" propositions, and "forget" recent 
changes in the world. As we will see in section ||, this means that, in cases where 
the outdated beliefs were encountered with greater frequency and thus stored 
in memory repeatedly, that agents who can access all of their beliefs are more 
likely to decide to believe out of date beliefs .p] 

The plan library contains domain plans for Design-House and its subgoals, 
as well as discourse plans for the discourse acts shown in figure 0. The discourse 



plans will be discussed in detail in section 4.5 



4.4 Design- World Tasks 

The Design- World task as a plan is simple since it involves linear planning of 
subgoals which contribute to higher level goals, as shown in figure ^. However, 
the task is easily modified according to the general task features discussed above 
so that it is more difficult to perform well. These modifications are applicable 
to other tasks besides the testbed task, and affect the degree to which different 
aspects of the task contribute to the performance evaluation. 

There are 4 versions of the task that will be used to test the hypotheses 
introduced in section ^ Standard, Zero-Nonmatching-Beliefs, Matched-Pair, 
and Zero-Invalid. The Standard task is inferentially simple, fault tolerant and 
requires low levels of belief coordination. The other tasks are more difficult 
because they increase the degree of belief coordination required, and magnify 
the effect of mistakes. The Zero-Nonmatching-Beliefs and Matched-Pairs tasks 
each explore different aspects of belief coordination and inferential complexity. 
The Zero- Invalids task is fault intolerant. 



Standard Task The Standard task provides a baseline and is inferentially 
simple, fault tolerant and requires low levels of belief co-ordination. The Stan- 
dard task is defined so that the QUALITY OF SOLUTION for a particular dialogue 
consists of the sum of all the furniture pieces for each valid step in the plan. In 
addition, the task is defined so that partial solutions are possible. Any number 
of furniture items in a room is a valid plan, rather than requiring that each 
room must have all four furniture items. This choice about task determinacy 
makes it possible to see the gradient effect on performance of different resource 
restrictions. 

In addition the Standard task is fault tolerant. If agents make a mistake 
in planning and insert invalid steps in their collaborative plan, the point values 
for invalid steps in the plan are simply subtracted from the score. Thus in 
the Standard task agents are not heavily penalized for making mistakes due to 
inserting steps in plans that are not actually executable. 

^^It is unclear whether this prediction of the behef deliberation algorithm is consistent with 
human performance. However it is easy to think of examples of humans making the kind of 
error that this model would predict. For example, I commonly believe (falsely) that I have 
eggs at home in the refrigerator, even though I used them the previous evening for quiche. 
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Figure 6: Standard Version of the Task, Fault tolerant and Partial Solutions 
acceptable. 

The Standard task is inferentially simple because the agent's only inferences 
are those by means-end reasoning to generate options, those by deliberation to 
evaluate options, and act-effect inferences after committing to an action. Each of 
these inferences rely on one premise: the premise (has ?agent ?item) supports 
means-end reasoning, the premise (score ?item ?score) supports deliberation, 
and the premise (intend A B ?option) supports inferring the effect of ?option. 
Thus, none of these processes require multiple premises to be simultaneously 
salient. However, it is possible to test hypotheses about processing effort in 
the Standard task by making it easier to access these inferential premises. It is 
also possible to test the effect of resource limits since these premises must be 
accessible to perform optimally on the task. 

The degree of belief coordination in the Standard task is low because agents 
are only required to coordinate on the intentions corresponding to put-acts 
as shown in figure |6[ These intentions are always explicitly discussed so that 
coordination is always achieved. 

The Zero-Nonmatching-Beliefs task The Zero-Nonmatching-Beliefs task 
increases the degree of belief coordination by requiring agents to base their 
deliberation process on the same beliefs. They must have the same WARRANTS 
for adopting an intention in order to do well on this task. Figure ^ shows 
the structure of beliefs about intentions and warrants for the Design-House 
goal. In the Zero-Nonmatching-Beliefs task, as shown, the warrants underlying 
intentions must also be mutually believed. This is not generally required in 
forming a collaborative plan because agents A and B can mutually believe that 
they have maximized utility without necessarily agreeing on what that utility 
is. Furthermore, in the general case, when agents have only one option under 
consideration, they do not need to evaluate the utility of that one option in 
order to decide whether to accept or reject it. 

The Zero-Nonmatching-Beliefs task provides a basis for testing hypotheses 
Al and A5, introduced in section]^ by increasing the degree of belief coordina- 
tion required to perform well on the task, where the beliefs are those used in 
deliberation. 

The Zero-Nonmatching-Beliefs task models particular types of real-world 
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Figure 7: Tasks can differ as to the level of mutual belief required. Some tasks 
require that W, a reason for doing P, is mutually believed and others don't. 

tasks since it is not always necessary for agents to agree on the reasons for car- 
rying out a particular action. For example, in the negotiation between the union 
and the management of a company, any agreement that is reached is agreed to 
by each party for different reasons. An agreement for a shorter work week 
is supported by the union because more overtime pay is possible for those who 
want to work more and is supported by the management because the company's 
insurance premiums will be lower. However, if two agents agree on a plan, but 
have different reasons for doing so, they may change their beliefs and their in- 
tentions under different conditions. The most stable, long-term, collaborative 
plans will be those in which agents agree on both the actions to be performed, 
as well as the reasons for doing those actions. Under these conditions the agents 
will be more likely to revise their intentions in a compatible way and intention 
revision should be simpler. Thus the Zcro-Nonmatching-Bcliefs task examines 
one extreme of belief coordination for deliberation. 

Matched Pair Tasks Another aspect of belief coordination has to do with 
coordinating beliefs based on inferences. There are two task definitions that in- 
crease inferential complexity by increasing the number of independent premises 
that must be simultaneously available in working memory. These are: (1) 
Matched Pair Same Room, and (2) Matched Pair Two Room. Figure ^ shows 
the Matched Pair Two Room version of the task. Each intention to put a furni- 
ture item in a room can potentially contribute to another intention of achieving 
a matched pair goal. A Matched-Pair is two furniture items of the same color. 
The inference of a Matched-Pair is based on the minor premises shown in 13: 

(13) a. (Intend A B (Put ?agent ?item-l ?room-l) 

b. (Intend A B (Put ?agent ?item-2 ?room-2) 

c. (Equal (Color ?item-l) (Color ?item-2)) 

Making this inference is more demanding for resource limited agents, than 
the processing needed in the Standard task. In the standard task, in order to 
agree on one step of the plan, the agents must access at least one belief about a 
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Figure 8: Making additional Inferences: Matched Pair Two Room Task. Each 
PUT intention contributes both to a Design-Room goal as well as a Matched 
pair goal 

furniture item they have available. In order to properly evaluate the option rep- 
resented by that furniture item they must access the warrant for that option. 
In contrast, in both Matched Pair tasks, the warrant for both beliefs must be 
accessed for both furniture items that could contribute to a Matched-Pair goal. 
In addition, the premises in 13 must also be accessed, and furthermore, each 
version of the Matched-Pair task requires one additional premise. 

The difference in the two Matched Pair tasks is whether the matches are in 
the same or different rooms. Matched-Pair Same Room requires the additional 
premise that (Equal ?room-l ?room-2) while Matched Pair Two Room requires 
the additional premise that NOT (Equal ?room-l ?room-2). Because premise 
13a is inferred and stored in memory at the time that a proposal is accepted, 
and because the agents always complete one room before starting another, the 
necessary premise shown in 13a is more likely to be salient in the Matched-Pair- 
Same-Room task. 

As discussed in sections |] and ||, we wish to provide a test of hypothesis 
A2, the discourse inference constraint, and examine how it affects the coordi- 
nation of inference in collaborative planning. Hypotheses A3 and A5 together 
imply that the complexity of inference should interact with the agent's ability 
to stay coordinated on inferences. Since our measure of inferential complexity 
is the number of independent premises required to draw an inference, both the 
Matched-Pair-Same- Room task and the Matched-Pair-Two- Room task increase 
inferential complexity. 

Evaluating the quality of solution for the Matched-Pair tasks reflects the 
emphasis on coordinating on inferences, since both Matched-Pair tasks require 
that both agents make the matched pair inferences in order to score points for 
matched-pairs. The task measures how well agents are coordinated on the 
inferred intentions that follow from the intentions that were explicitly agreed 
upon. Only the intentions that contribute to Matched-Pairs are counted in the 
final solution, and the utility of these intentions is the sum of the utilities of the 
two furniture items, plus the utility of the Matched-Pair (50 points). 
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Figure 9: In Zero Invalids Task, invalid steps invalidate the whole plan. 



Zero-Invalids Task The Zero-Invalids task is fault intolerant (see figure ^ 
The assumption in the Zero Invalids Task is that any mistake invalidates the 
whole plan. This is a feature of task determinacy: while there are still many 
possible 8 step plans, all of the solutions with less than 8 items that would be 
counted as valid solutions for the Standard Task are not valid solutions for the 
Zero Invalids Task. 

This task is an example of one extreme of fault intolerance. In general, how 
fault tolerant a task is depends on the interdependency of different subparts of 
the problem solution. For some tasks, a mistake can invalidate the whole solu- 
tion, for other tasks, partial solutions without the invalid step may be adequate. 
For example, in a task like furnishing a room it may be desirable to have both 
a couch and a chair, but if the agents make a mistake and assume they can use 
a chair that will end up in a different room, the room is still partially furnished 
and usable. On the other hand, in a task such as building a tower, each step 
depends on the successful execution of the previous step and the whole plan 
may be invalid if a step to put down a foundation block cannot be executed. 

Note that an agent can reject another agent's proposal based on believing 
that it would add an invalid step to the plan, as shown in the rejection utterance 
act schema in section 3.2. Since agents have to agree on each step of the plan, 
an invalid step can only be inserted into the plan if both agents have failed to 
remember that the preconditions for the plan do not hold. 



4.5 Varying Communicative Strategies 

Section ^discussed the discourse act schema that controls how agents participate 
in dialogue, and discussed the types of utterance acts that the discourse acts are 
composed of. Which utterance acts a discourse act decomposes into depends on 
COMMUNICATIVE STRATEGIES which codify different communicative choices for 
how to do a particular discourse action. Agents are parameterized for different 
communicative strategies by placing different expansions of discourse plans in 
their plan libraries. 

Varying an agent's communicative strategies provides the basis for testing 
the hypotheses about the potential benefits of IRUs. Varying the degree of ex- 
plicitness of a discourse act is the basis of the four communicative strategies 
tested below: (1) All-Implicit; (2) Close-Consequence; (3) Explicit- Warrant; 
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and (4) Matched-Pair-Inference-Explicit. All of these strategies are hypothe- 
sized to mitigate agents' attentional and inferential resource limits, under the 
assumptions about their architecture and the definition of quality of solution 
for the task. 

Figures |ll|, |l^ and |l^ show a plan operator for each strategy. These op- 



erators draw on work by Walker and Rambow | Walker and Rambow, 1994 , and 



also make use of Moser and Moore's definitions of discourse acts and the integra- 
tion of Rhetorical Structure Theory (RST) and Grosz and Sidner's theory of dis- 
course |Mann and Thompson, 1987, Grosz and Sidner, 1986, Moser and Moore, 1992, 
Moser and Moore, 1995[ . The predicates in the plan operators are precisely de- 
fined by the collaborative planning model and agent architecture discussed in 
section ^. Each discourse act such as a proposal, is composed of a CORE act 
which represents the primary purpose of the act such as a propose utterance act, 
as well as a contributor act, such as a warrant, who se purpose is to increase 
the likelihood of achieving the intention o f the core Moore and Paris, 1993 , 



Young et ai, 1994, Young and Moore, 1994 



NAME: Proposal- All-Implicit (?speaker, ?hearer, ?act) 

EFFECT: (desire ?hearer (do ?hearer ?act) ?utility-act) 

CONSTRAINTS: (and (option ?act) 

(salient ?hearer (utility ?act ?utility-act))) 
CORE: (propose ?speaker ?hearer ?act) 



Figure 10: The Proposal plan operator for an All-Implicit Agent 

All-Implicit Strategy The All-Implicit strategy is an expansion of a dis- 
course plan to make a PROPOSAL, in which a PROPOSAL decomposes trivially 
to the communicative act of PROPOSE. See the plan-operator in figure |l^. This 
strategy is the communicative choice shown in ^ in section |^, and provides a 
baseline strategy that is consistent with the redundancy constraint. The 
experiments below will compare the performance of agents using the All-Implicit 
strategy with the performance of agents using the other proposal strategies dis- 
cussed below. 

In dialogue |l^ on page both Design- World agents communicate using the 
All-Implicit strategy, and the proposals are shown in utterances 1,2, and 3. As 
figure ^ shows, the All-Implicit strategy includes no additional information in 
proposals, leaving it up to the other agent to retrieve them from memory. 

The constraints on using the All-Implicit strategy are that (1) the proposed 
?act is an option generated by means-end reasoning and (2) that the utility is 
SALIENT to the hearer. In the experiments below, agents are parameterized to 
use this strategy consistently, so that an agent using the All-Implicit strategy 
assumes everything the hearer knows is always salient. The effect of the proposal 
is that the hearer will evaluate that proposal and deliberate the degree to which 
the hearer desires the act. However, whether the hearer will accept or reject 
the proposal depends on other options the hearer knows about. Clearly the 
speaker cannot predict these other options. Thus the effect of the proposal does 
not specify that the action will be intended by the hearer. This holds for all 
proposal operators. 

The All- Implicit strategy can be used by agents in any of the Design- World 



tasks discussed in section 4.4, since the agents are capable of making inferences 
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or accessing memory to fill in what has been left implicit with this strategy. 
Other inferences drawn by the hearer ft'om the proposal utterance act are not 
shown in figure For example, an agent can use the All-Implicit strategy in 
either of the Matched-Pair tasks, leaving it up to the other agent to infer which 
other intention makes a match with the option currently under consideration. 



NAME: Close- Consequence (?speaker,?hearer,?act) 

EFFECT: (and (sahent ?hearer (effect ?act Yeffect)) 

(bel ?hearer (closed-segment ?act))) 
CONSTRAINTS: (and (intend ?speaker ?hearer ?act) 

(open-segment ?act)) 
CONTRIBUTOR: (say ?speaker ?hearer (effect ?act ?effect)) 
CORE: (close ?speaker ?hearer ?act)) 



Figure 11: The Closing plan operator for a Close-Consequence Agent 

Close Consequence In dialogue 14, agent CLC uses the Close- Consequence 
strategy. The plan-operator for this strategy is shown in figure The core 
of the strategy is explicit closing statements, such as 14-2, on the completion 
of the intention associated with a discourse segment. A contributor to CLC's 
closing discourse act is an IRU such as 14-3: CLC makes the inference explicit 
that since they have agreed on putting the green rug in the study, they no longer 
have the green rug (act-effect inference). 

(14) 1: BILL: Then, let's put the green rug tn the study. 

(propose agent-bill agent-clc option-30: put-act (agent-bill green rug room- 
1)) 

2: CLC: So, we've agreed to put the green rug in the study. 

(close agent-clc agent-bill intended-30: put-act (agent-bill green rug room- 

1)) 

3: CLC: And we no longer have green rug. 

(say agent-clc agent-bill bel-48: hasn't (agent-bill green rug)) 

The Close-Consequence strategy of making inferences explicit at the close 
of a segment models the naturally occurring example in ^ In both cases an 
inference is made explicit that follows from what has just been said, and the in- 
ference is sequentially located at the close of a discourse segment. This strategy 



can be used by agents in any of the Design- World tasks discussed in section |4.4 . 

The Close-Consequence strategy will be used to test hypothesis C2 about 
potential benefits of making inferences explicit, and will be contrasted with the 
All-Implicit strategy where no closing acts are produced. Note in dialogue ^ 
on page that both the agents go on to the next phase of the plan, leaving 
the inference of both Acceptance and Closing for the other agent to make. 
However, Close-Consequence is not a good test of other hypotheses because 
in the experiments both agents always make act-effect inferences, and these 



inferences are not difficult to make. See [Walker, 1995b | for a discussion of 



experiments which vary an agent's capability to make these inferences. 

Explicit Warrant The Explicit- Warrant strategy varies the proposal dis- 
course act by including warrant IRUs in each proposal. The plan operator 
is given in figure n2 and exemplified by the dialogue excerpt in 15. Remember 
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NAME: 


Proposal- Explicit- Warrant (/speaker, /nearer, ract) 


EFFECT: 


(and (desire ?hearer (do ?hearer ?act) ?utility-act) 




(salient ?hearer (utility ?act ?utility-act))) 


CONSTRAINTS: 


(and (option ?act) 




(not-salient ?hearer (utility ?act ?utility-act))) 


CONTRIBUTOR: 


(say ?speaker Yhearer (utility ?act ?utility-act)) 


CORE: 


(propose ?speaker Yhearer ?act) 



Figure 12: The Proposal plan operator for an Explicit- Warrant Agent 



that a WARRANT for an intention is a reason for adopting the intention, and 
here warrants are the score propositions that give the utility of the proposal, 
which are mutually believed at the outset of the dialogues. In 15, the warrant 
IRU in 15-1 contributes to the proposal (core act) in 15-2. 

(15) 1: lEI: Putting in the green rug is worth 56 

(say agent-iei agent-iei2 bel-2: score (option-2: put-act (agent-iei green rug 
room-1) 56)) 

2: lEI: Then, let's put the green rug in the study. 

(propose agent-iei agent-iei2 option-2: put-act (agent-iei green rug room-1)) 

The plan operator in figure [T^ specifies that an effect of using this plan is that 
the utility of the proposal option is salient. Since warrants are used by the other 
agent in deliberation, the Explicit- Warrant strategy can save the other agent the 
processing involved with determining which facts are relevant for deliberation 
and retrieving them from memory. A constraint on using the Explicit- Warrant 
plan operator is that the utility of the proposal act is not already salient. 

In the experiments below, agents are parameterized to use this strategy con- 
sistently, with the result that an agent using the Explicit- Warrant strategy as- 



sumes that the warrant is never salient for the hearer. See [Jordan and Walker, 1995 
for experiments in which an agent attempts to maintain a dynamic model of 
what is salient for the other agent. The Explicit- Warrant strategy also occurs 
in natural dialogues as shown in the naturally occurring example in dialogue 

This strategy can be used by agents in any of the Design- World tasks dis- 



9b 



cussed in section The Explicit Warrant strategy provides a test of hypothe- 
sis Al: agents produce Attention IRUs to support the processes of deliberating 
beliefs and intentions. It can also be used to test hypothesis A4: the choice 
to produce an Attention IRU is related to the degree to which an agent is re- 
source limited in attentional capacity. In the Standard task this is predicted to 
improve the performance of resource limited agents. In the Zero-NonMatching 
beliefs task this strategy should increase the likelihood that agents coordinate 
their beliefs about the warrants underlying different plan steps. 



The Matched-Pair-Inference-Explicit strategy The Matched-Pair-Inference- 
Explicit strategy expands the PROPOSAL discourse act to two communicative 
acts. See figure The contributor of the proposal consists of statement about 
what is already intended, while the core is a propose utterance act, as in 16-6 
followed by 16-7 in one turnf^ 

^^The names of agents who use the Matched-Pair-Inference-Explicit strategy are a numbered 
version of the string "IMI" which stands for Implicit acceptance, Match Inference. 
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NAME: 


Proposal-Matched-Pair (?speaker, ?hearer, ?actl) 


EFFECT: 


(and (desire ?hearer (do ?hearer ?actl) ?utility-act) 




(salient /nearer (intend /speaker : nearer ractzjj) 


CONSTRAINTS: 


(and (option ?actl) 




(matched-pair ?actl ?act2) 




(salient ?hearer (utility ?actl ? utility- act))) 




(not (salient ?hearer (intend ?speaker ?hearer ?act2))) 


CONTRIBUTOR: 


(say ?speaker ?hearer (intend ?speaker ?hearer ?act2)) 


CORE: 


(propose ?speaker ?hearer ?actl) 



Figure 13: The Proposal plan operator for an Matched-Pair-Inference-Explicit 
Agent 



(16) 6: IMI2: We agreed to put the purple couch in the study. 

(say agent-imi2 agent-imi intended-51: put-act (agent-imi2 purple couch 
room-1)) 

7: IMI2: Then, let's put the purple rug in the living room. 

(propose agent-imi2 agent-imi option-80: put-act (agent-imi2 purple rug 
room-2)) 

The statement in 16-6 is an IRU, because it realizes information previously 
inferred by both agents, and models the IRU in dialogue |ll|. Matched-Pair- 
Inference- Explicit is the variation discussed in section |l| in choice ^. This strat- 
egy is only intended to be used in the Matched-Pair tasks as a way of testing 
hypothesis A2, the discourse inference constraint. As figure |l^ shows, a con- 
straint on using this plan is that the speaking agent has inferred a Matched-Pair 
for the option being proposed. 

Although this strategy is specifically tied to Matched-Pair inferences, it pro- 
vides a test of a general strategy for making premises for inferences salient, in 
tasks that are inferentially complex, and which also require agents to remain co- 
ordinated on inferences. For example, to generalize this strategy to other cases 
of plan-related inferences, the clauses for (Matched-Pair ?actl ?act2) could be 
replaced with the more general ( Generates ?actl A ?act2 ?act3), where the 
generates relation is to be inferred [ Pollack, 1986| , |Di Eugcnio, 1993| . 



Note that the effect of using this strategy is not that the hearer makes 
the matched pair inference, rather the effect is that the premise for the desired 
inference is salient. A constraint on using this strategy is that this premise is not 
already salient. However, agents parameterized with this strategy always assume 



that the premise is not salient for the hearer. See [Jordan and Walker, 1995] for 
experiments in which agents attempt to maintain a dynamic model of the other 
agent's attentional state. 

4.6 Plan Evaluation 



Section 3.3 specified a model of how collaborative plans are evaluated in terms 
of quality of solution and collaborative effort. Design- World is con- 
structed in order to be able to measure the quality of a solution as well as 



collaborative effort. Section 4.4 defined quality of solution for all of the Design- 
World tasks. We want to examine trade-offs in performance between strategy 
choices. 

It is obvious that these trade-offs can be related to the relative contributions 
of total cost of communication versus the total cost of inference versus the total 
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Figure 14: Performance Distributions showing the effect of AWM parameteriza- 
tion for dialogues between two AU-ImpUcit Agents when all processing is free. 
The three performance distributions are for LOW, MID and HIGH AWM agents. 




Figure 15: Performance distributions showing the effect of increased retrieval 
cost for each AWM range for dialogues between two All-Implicit Agents. The 
three performance distributions are for LOW, MID and HIGH AWM agents, retcost 
= .001 



cost of retrieval for both agents' collaborative effort. Thus, to calculate collab- 
orative effort, we cannot simply add up the number of retrievals, inferences and 
messages. Consider that a Consequence IRU that makes an inference explicit 
ensures that an inferred belief becomes part of the discourse model. However, 
if the inference would have been made anyway, the benefit of the strategy is de- 
pendent upon whether the effort to make the inference without the consequence 
IRU would have been greater than the cost of processing the extra utterance 
of the Consequence IRU. A similar argument holds for the potential benefit 
of Attention IRUs. Whenever an Attention IRU reduces overall effort for re- 
trieval while not increasing communication effort to the same degree, it will be 
beneficial. This hypothesis is given below in a general form. 

HYPOTH-Il: Strategies that reduce collaborative effort without af- 
fecting quality of solution are beneficial. 

This hypothesis follows directly from the definition of performance repeated 
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here for convenience from section 



3.3: 



PERFORMANCE = QUALITY OF SOLUTION - COLLABORATIVE EF- 
FORT 



We need to introduce parameters for the effort involved with each of the 
component processes because they are not strictly comparable, and because 
these modules are implementation dependent. Thus agents' retrieval, inference 
and communicative costs are parameterized by (1) COMMCOST: cost of sending 
a message; (2) infcost: cost of inference; and (3) retcost: cost of retrieval 
from memory. Collaborative effort is then defined as: 



COLLABORATIVE EFFORT = 

(cOMMCOST X total messages for both agents) 
+ (infcost X total inferences for both agents) 
+ (retcost X total retrievals for both agents) 



We will use these cost parameters to explore three extremes in this space: 
(1) when processing is free; (2) when retrieval effort dominates other processing 
costs; and (3) when communication effort dominates other processing costs. The 
parameters support modeling various instantiations of the agent architecture 
given in figure |^. For example, varying the cost of retrieval models different 
assumptions about how the beliefs database, plan library and working memory 
are implemented. Varying the cost of communication models situations in which 
communication planning is very costly. The relation between the values of these 
parameters and the utilities of the steps in the plan determines experimental 
outcomes, rather than the absolute values. 

As an example of the effect of varying these costs, consider the plots of 
performance distributions shown in figures |l^ and |l^ for low, mid and high 
AWM. In these figures, performance is plotted on the x-axis and number of 
simulations at that performance level are given by bars on the y-axis. The 
performance distributions in figure |l^ demonstrate the increase in quality of 
SOLUTION that we would expect with increases in AWM, given no processing 
costs.0 Figure |l^ shows what happens when processing is not free: here a 
retrieval cost of .001 means that every memory access reduces quality of solution 
by 1/1000 of a poi nt ( remember that the utilities of plan steps range between 10 
and 56). As figure |l^ shows, the ability to access the whole beliefs database in 
reasoning does not always improve performance since high AWM agents perform 
similarly to mid AWM agents. 



4.7 Summary: Mapping from Naturally Occurring Data 
to Design World Experiments 

Section ^ proposed hypotheses about the function of IRUs in human to human 
collaborative planning dialogues, and then section || presented a model for col- 
laborative planning dialogues based on the observations in section | , Section p 
then described Design- World as a testbed of the model, and sections iA and O 
introduced a number of parameters of the testbed that are intended to model 



These distributions approximate Beta distributions [ Wilks, 
was used to determine tliat 200 runs would 

with the largest variance, for parameters R and S greater than or equal to 1, 
distribution. This larges t variance distribution would require approximately 133 samples 
An empirical evaluation of the adequacy of this sample size for 



1962 1, and this approximation 
guarantee stable results. The Beta distribution 

is the uniform 



Siegel, 195( 



Wilks, 1962 



three different strategies was tested to see if any differences showed up in alternate runs of 
100; no differences were found. 



34 



the features of the human-human dialogues and support testing of the hypothe- 
ses. Here I wish to summarize the mapping between the naturally occurring 
dialogues and the design of the testbed in order to clarify the basis for the 
experiments in the next section. 

The testbed and the experimental parameters are based on the following 
mapping between human-human collaborative planning dialogues and the testbed. 
First, the planning and deliberation aspects of human processing are modeled 
with the IRMA architecture, and resource limits on these processes are mod- 
eled by extending the IRMA architecture with a model of Attention/ Working 
Memory (awm) which has been shown to model a limited but critical set of 
properties of human processing. Second, the processing of dialogue is tied to 
the agent architecture. Third, the mapping of a warrant relation between an 
act and a belief in naturally occurring examples such as 9b is modeled with a 
WARRANT relation between an act and a belief in D esign - World as seen in the 



Explicit- Warrant communication strategy in section 4.5, Fourth, the mapping 



assumes that arbitrary content based inferences in natural dialogues such as that 
discussed in relation to example |ll| can be mapped to content based inferences 
in Design- World such as those required for doing well on the Matched-Pair 
tasks. Fifth, the mapping is based on the assumption that task difficulty in 
naturally occurring tasks such as those in the financial advice domain can be 
related to three abstract features: (1) inferential complexity as measured by 
the number of premise required for making an inferences; (2) degree of belief 
coordination required on intentions, inferences and beliefs underlying a plan; 
and (3) task determinacy and fault tolerance. Finally the mapping assumes 
that it is reasonable to evaluate the performance of the agents in collaborative 
planning dialogues by using domain plan utility for a measure of the quality of 
solution and defining the cost to achieve that solution as collaborative effort, 
appropriately parameterized. 

The details of this mapping specifies how the testbed implements the model 
of collaborative planning and provides the basis for extrapolating from the 
testbed experimental results to the human-human dialogues that are being mod- 
eled. The testbed provides an excellent environment for testing the hypotheses 
to the extent that the model captures critical aspects of human-human dia- 
logues. 



5 Experimental Results 

5.1 Statistically Evaluating Performance 

The experiments examine the interaction between tasks, communication strate- 
gies and AWM resource limits. Every experiment varies AWM over three ranges: 
LOW, MID, and HIGH. In order to run an experiment on a particular commu- 
nicative strategy for a particular task, 200 dialogues for each AWM range are 
simulated. Because the AWM model is probabilistic, each dialogue simulation 
has a different result. The AWM parameter yields a performance distribution for 
very resource limited agents (low) , agents hypothesized to be similar to human 
agents (mid), and resource unlimited agents (high). Sample performance dis- 
tributions for QUALITY OF SOLUTION (with no collaborative effort subtracted) 
from runs of two All-Implicit agents for each AWM setting are shown in figure 

To test our hypotheses, we want to compare the performance of two differ- 
ent communicative strategies for a particular task, under different asssumptions 
about resource limits and processing costs. To see the effect of communicative 
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strategy and AWM over the whole range of AWM settings, we first run a two-way 
analysis of variance (anova) with AWM as one factor and communication strat- 
egy as another.^ The anova tells us whether: (1) AWM alone is a significant 
factor in predicting performance; (2) communication strategy alone is a signif- 
icant factor in predicting performance; and (3) whether there is an interaction 
between communication strategy and AWM. 

However, anova alone does not enable us to determine the particular AWM 
range at which a communication strategy aids or hinders performance, and 
many of the hypotheses about the benefits of particular communication strate- 
gies are specific to how resource limited an agent is. Furthermore, whenever 
strategy affects performance positively for one value of AWM and negatively 
for another value of AWM, the potential effects of strategy cannot be seen 
from the anova alone. Therefore, we conduct planned comparisons of strate- 



gies using the modified Bonferroni test (hereafter MB) [ Keppel, 1982 , within 
each AWM range setting to determine which AWM range the strategy affects 
llCohen, 19951 [Keppel, 1981 p| On the basis of these comparisons we can say 



whether a strategy is beneficial for a particular task for a particular AWM 
range. 

A strategy A is beneficial as compared to a strategy B, for a 
particular AWM range, in the same task situation, with the same 
cost settings, if the mean of A is significantly greater than the mean 
of B, according to the modified Bonferroni test (MB) test. 

The converse of beneficial is detrimental: 



A strategy A is detrimental as compared to a strategy B, for a 
particular AWM range, in the same task situation, with the same cost 
settings, if the mean of A is significantly less than the mean of B, 
according to the modified Bonferroni test (MB) test. 

Strategies need not be either beneficial or detrimental, there may be 
no difference between two strategies. Also with the definition given above a 
strategy may be both beneficial and detrimental depending on the range 
of AWM that the two strategies are compared over, i.e. A strategy may be 
beneficial for LOW AWM agents and detrimental for high AWM agents. 

A difference plot such as that in figure |l^ is used to summarize a com- 
parison of two strategies, strategy 1 and strategy 2. In the comparisons below, 
strategy 1 is either Close- Consequence p] Explicit- Warrant, or Matched-Pair- 
Inference-Explicit and strategy 2 is the All-Implicit strategy. Differences in 
performance means between two strategies are plotted on the Y-axis against 
AWM ranges on the X-axis. Each point in the plot represents the difference in 
the means of 200 runs of each strategy at a particular AWM range. These plots 
summarize the information from 1200 simulated dialogues. 
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Figure 16: If Processing is Free, Explicit- Warrant is neither beneficial nor detri- 
mental for all AWM settings: Strategy 1 of two Explicit- Warrant agents and 
strategy 2 of two All-Implicit agents: Task = Standard, commcost = 0, infcost 
= 0, retcost = 
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Figure 17: Explicit- Warrant is beneficial for mid and high AWM agents when 
Retrieval dominates processing costs: Strategy 1 is two Explicit- Warrant agents 
and strategy 2 is two All-Implicit agents: Task = Standard, commcost = 1, 
infcost = 1, retcost = .01 

5.2 Standard Task 

Rcinicmbcr that the Standard task is defined so that the quality OF SOLUTION 
that agents achieve for a DESIGN-HOUSE plan, constructed via the dialogue, is 

^**The experimental performance distributions are not nornj^ and the variance is not the 
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Figure 18: Explicit- Warrant is detrimental for low and high AWM agents when 
communication effort is high: Strategy 1 is two Explicit- Warrant agents and 
strategy 2 is two All-Implicit agents: Task — Standard, commcost = 10, infcost 
= 0, retcost = 



the sum of the utilities of each valid step in their plan. The task has multiple 
correct solutions and is fault tolerant because the point values for invalid steps 
in the plan are simply subtracted from the score, with the effect that agents 
arc not heavily penalized for making mistakes. Furthermore, the task has low 
inferential complexity: the only inferences agents are required to make are those 
for deliberation and means-end reasoning. In both of these cases, to make these 
inferences, agents are only required to access a single minor premise 

All-Implicit agents do fairly well at the Standard task, under assumptions 
that all processing is free, as shown in the performance plot in figure |lj. How- 
ever, as retrieval costs increase, high AWM agents don't do as well as when 
retrieval is free, because they expend too much effort on retrieval during collab- 
orative planning. Compare the high AWM distribution in figure ^ with that in 
figure |l^. Thus for the Standard task, high AWM agents have the potential to 
benefit from communication strategies that reduce the total effort for retrieval, 
when retrieval is not free. In addition, although the task has minimal inferential 
complexity, easy access to information that is used for deliberation, which the 
Explicit- Warrant strategy provides, could benefit low AWM agents, since they 
might otherwise make nonoptimal decisions. Furthermore, although the task is 
fault tolerant, agents are still penalized for making errors since errors do not con- 



same over different samples, liowever anova 



s rnhiist a.rair 



under the conditions in these experiments | Cohen, 1995 



st the vinlatin n of these assumptions 



Keppcl, 198S| 



^® According to the modified Bonferroni test, the significant F values for the planned com- 
parisons reported below are 3.88 for a p < .05, 5.06 for a p < .025, 6.66 for a p < .01, and 
9.61 for a p < .002. 

■^'^In experiments with Close-Consequence only one agent in a dialogue uses the Close- 
Consequence strategy because the use of this strategy is constrained to when the dialogue 
segment is open. See figure nil Since only one agent will ever produce a closing statement for 
any dialogue segment, only one agent is given the option in the simulations. 
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tribute to performance. Thus for the Standard task, communication strategies 
such as Close-Consequence that can reduce the number of errors could be bene- 
ficial. Below we will compare the All-Implicit strategy to the Explicit- Warrant 
strategy and the Close- Consequence strategy. 

Explicit- Warrant The Explicit- Warrant strategy can be used in the Stan- 
dard task to test hypothesis Al: agents produce Attention IRUs to support the 
processes of deliberating beliefs and intentions. It can also be used to test hy- 
pothesis A4: the choice to produce an Attention IRU is related to the degree to 
which an agent is resource limited in attentional capacity. Thus one prediction 
is that the Explicit- Warrant strategy will result in higher performance for LOW 
AWM agents even when processing is free by ensuring that they can access the 
warrant and use it in deliberation, thus making better decisions. 

Figure |l^ plots the differences in the performance means between the Explicit- 
Warrant strategy and the All-Implicit strategy for low, mid and high AWM 
agents. A two-way anova exploring the effect of AWM and the Explicit- Warrant 
strategy for the Standard task shows that AWM has a large effect on perfor- 
mance (F= 336.63, p< .000001). There is no main effect for communicative 
strategy (F = 1.92, p < 0.16). However, there is an interaction between AWM 
and communicative choice (F=1136.34, p< .000001). 

By comparing performance within a particular AWM range for each strat- 
egy we can see which AWM settings interact with communicative strategy. The 
planned comparisons using the Modified Bonfcronni (mb) test show that the 
Explicit- Warrant strategy is neither beneficial nor detrimental in the Stan- 
dard task, in comparison with the All-Implicit strategy, if all processing is free 
(mb(low) = 0.29, ns; mb(mid) — 2.79, ns; mb(high) — 0.39, ns). Note that 
there is a trend towards the Explicit- Warrant strategy being detrimental at MID 

AWM. 

The hypothesis based on the corpus analysis was that low AWM agents 
might benefit from communicative strategies that include IRUs. However, this 
hypothesis is disconfirmed. Further analysis of this result suggests a hypothe- 
sis not apparent from the corpus analysis: any beneficial effect of an IRU can 
be cancelled for resource limited agents because IRUs may displace other in- 
formation from working memory that is more useful. In this case, despite the 
fact that the warrant information is useful for deliberation, making the warrant 
salient displaces information that can be used to generate other options. When 
agents are very resource-limited making an optimal decision is not as important 
as being able to generate multiple options. 

The Explicit- Warrant strategy can also be used in the Standard task to 
test hypothesis II: strategies that reduce collaborative effort overall may be 
beneficial. Thus, another prediction is that by providing the warrant used in 
deliberating a proposal with every proposal, the Explicit- Warrant strategy has 
the potential to reduce resource consumption when accessing memory has some 
processing cost. 

Figure |l^ plots the differences in the performance means between the Explicit- 
Warrant strategy and the All-Implicit strategy for low, mid and high AWM 
agents when retrieval effort dominates processing. A two-way anova explor- 
ing the effect of AWM and the Explicit- Warrant strategy for the Standard task, 
when retrieval cost dominates processing, shows that AWM has a large effect on 
performance (F= 330.15, p< .000001). There is also a main effect for commu- 
nicative strategy (F = 5.74, p < 0.01), and an interaction between AWM and 
communicative choice (F= 1077.64, p< .000001). 

The planned comparisons using the MB test to compare performance at each 
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AWM range show that, in the Standard task, in comparison with the AU-ImpHcit 
strategy, the Explicit- Warrant strategy is neither beneficial nor detrimental for 
LOW AWM agents (mb(low) = 0.27, ns). However, hypothesis II is confirmed 
because the Explicit- Warrant strategy is beneficial for mid AWM agents MB(mid) 
= 86.43, p< .002. The Explicit- Warrant strategy also tends towards improving 
performance for high AWM agents mb(high) = 2.07, p < .10). For higher AWM 
values, this trend is because the beliefs necessary for deliberating the proposal 
are made available in the current context with each proposal, so that agents 
don't have to search memory for them. 

As an additional test of hypothesis II, a final experiment tests the Explicit- 
Warrant strategy against the All-Implicit strategy in a situation where the cost 
of communication dominates other processing costs. Figure |l^ plots the differ- 
ences in the performance means between the Explicit- Warrant strategy and the 
All-Implicit strategy for low, mid and high AWM agents when communication 
effort dominates processing. A two-way anova exploring the effect of AWM and 
the Explicit- Warrant strategy for the Standard task, when communication effort 
dominates processing, shows that AWM has a large effect on performance (F= 
409.52, p< .000001). There is also a main effect for communicative strategy (F 
= 28.12, p < 0.000001), and an interaction between AWM and communicative 
choice (F= 960.24, p< .000001). 

The planned comparisons using the MB test to compare performance at each 
AWM range show that in this situation, when communication effort dominates 
processing, the Explicit- Warrant strategy is neither beneficial nor detrimental 
for MID AWM agents (mb(mid) = 0.12, ns. However, the Explicit- Warrant strat- 
egy is detrimental for both LOW and high AWM agents, MB(low) — 7.69, p< 
.01; MB(high) = 39.65, p < .01). Since this strategy includes an extra utterance 
with every proposal and provides no clear benefits, it is detrimental to perfor- 
mance in the Standard task when communication effort dominates processing. 
Below, when we compare this situation with that in the Zero-NonMatching- 
Beliefs task, we will see that this is due to the fact that the Standard task has 
low coordination requirements. 

Close-Consequence The Close-Consequence strategy of making inferences 
explicit can be used in the Standard task to test hypothesis C4: the choice 
to produce a Consequence IRU is related to a measure of 'how important' the 
inference is. Even though the Standard task is fault tolerant, every invalid step 
reduces the quality of solution of the final plan. Making act-effect inferences 
explicit decreases the likelihood of making this kind of error. 

The difference plot in figure |l^ plots performance differences between the 
Close-Consequence strategy and the All-Implicit strategy, in the Standard task, 
when all processing is free. A two-way anova exploring the effect of AWM and 
the Close-Consequence strategy in this situation shows that AWM has a large 
effect on performance (F= 249.20, p< .000001), and that there is an interaction 
between AWM and communicative choice (F= 919.27, p< .000001). 

Planned comparisons between strategies for each AWM range shows that the 
Close-Consequence strategy is detrimental in comparison with All-Implicit for 
LOW AWM agents (mb(low) — 8.70, p < .01). This is because generating op- 
tions contributes more to performance for agents with low AWM than avoiding 
errors, and the additional utterances that make inferences explicit in the Close- 
Consequence strategy has the effect of displacing facts that could be used in 
means end reasoning to generate options. There is no difference in performance 
for MID AWM agents (mb(mid) = .439, ns). 

However, comparisons between the two strategies for high AWM agents 
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Figure 19: Close-Consequence can be detrimental in the Standard Task for low 
AWM agents and beneficial for high AWM agents. Strategy 1 is the combination 
of an All-Implicit agent with a Close-Consequence agent and Strategy 2 is two 
All-Implicit agents, Task — Standard, commcost = 0, infcost = 0, retcost = 



shows that the Close- Consequence strategy is beneficial in comparison with 
AU-Imphcit (mb(high) = 171.71, p < .002). See figure |l|. This is because 
the belief deliberation algorithm increases the probability of high AWM agents 
choosing to believe out of date beliefs about the state of the world. The result 
is that they are more likely to have invalid steps in their plans. Thus the Close- 
Consequence strategy is beneficial because reinforcing the belief that a furniture 
item has been used makes it less likely that agents will believe that they still 
have that furniture ite m. This result is not predicted by any hypotheses, but 
as discussed in section L2 , this property of the belief deliberation mechanism 
has some intuitive appeal. In any case, this result provides a data point for 
the benefit of a strategy for making inferences explicit when the probability of 
making an error increases if that inference is not made. 



5.3 Zero NonMatching Beliefs Task 

Remember that the Zero-Nonmatching-Beliefs task requires a greater degree 
of belief coordination by requiring agents to agree on the beliefs underlying 
deliberation (warrants) .p^ Thus, it it increases the importance of making 
particular deliberation-based inferences, and can therefore be used to test hy- 
potheses Al, A4 and A5. Below we will compare the performance of agents 
using the All-Implicit strategy with the Explicit- Warrant strategy in the Zero- 
NonMatching-Bcliefs task. 

Figure |2^ plots the mean performance differences of the Explicit- Warrant 

Remember that in other tasks, agents do not have to agree on WARRANTS because in 
situations in which they know of only one option, they do not need to retrieve the warrant in 
order to be able to decide to accept the proposal. Thus when agents have limited AWM, they 
may accept a proposal without having retrieved the warrant. 
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Figure 20: Explicit- Warrant is beneficial for Zero-NonMatching-Beliefs Task for 
LOW and MID AWM agents: Strategy 1 is two Explicit- Warrant agents and strat- 
egy 2 is two All-Implicit agents: Task = Zero-Nonmatching-Beliefs, commcost 
= 0, infcost = 0, retcost = 
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Figure 21: Explicit- Warrant is beneficial for Zero-NonMatching-Beliefs Task, for 
LOW and MID AWM agents, even when communication cost dominates processing: 

Strategy 1 is two Explicit- Warrant agents and strategy 2 is two All-Implicit 
agents: Task = Zero-Nonmatching-Beliefs, commcost = 10, infcost = 0, retcost 
= 
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strategy and the All-Implicit strategy in the Zero-NonMatching-Beliefs task. A 
two-way anova exploring the effect of AWM and communicative strategy for the 
Zero-NonMatching-Beliefs task, shows that AWM has a large effect on perfor- 
mance (F= 471.42, p< .000001). There is also a main effect for communicative 
strategy (F = 379.74, p < 0.000001), and an interaction between AWM and 
communicative choice (F= 669.24, p< .000001). 

Comparisons within each AWM range of the two communicative strategies in 
this task shows that the Explicit- Warrant strategy is highly beneficial for low 
and MID AWM agents (mb(low) — 260.6, p < 0.002; mb(mid) — 195.5, p < 
0.002). The strategy is also beneficial for high AWM agents mb(high) — 4.48, 
p < 0.05). When agents are resource limited, they may fail to access a warrant. 
The Explicit- Warrant strategy guarantees that the agents always can access the 
warrant for the option under discussion. Thus, even agents with higher values 
of AWM can benefit from this strategy, since the task requires such a high degree 
of belief coordination. 

Hypothesis II can also be tested in this task. We can ask whether it is 
possible to drive the total effort for communication high enough to make it 
inefficient to choose the Explicit- Warrant strategy over All-Implicit. However, 
the benefits of the Explicit- Warrant strategy for LOW and MID AWM agents for 
this task are so strong that they cannot be reduced even when communication 
cost is high (mb(low) = 246.4, p < 0.002; mb(mid) = 242.7, p < 0.002). See 
figure |l|. In other words, even when every extra WARRANT message increases 
collaborative effort by 10 and reduces performance by 10, if the task is Zero- 
NonMatching-Beliefs, resource-limited agents using Explicit- Warrant do better. 
Contrast figure ^ with the Standard task and same cost parameters in figure 

However, when communication cost is high, the strategy becomes detrimen- 
tal for HIGH AWM agents (mb(high) = 7.56, p < 0.01). These agents can 
usually access warrants and the increase in belief coordination afforded by the 
Explicit- Warrant strategy does not offset the high communication cost. 

5.4 Inferential Tasks: Matched pairs 

The two versions of the Matched-Pair tasks described in section (1) increase 
the inferential complexity of the task and (2) increase the degree of belief coor- 
dination required by requiring agents to be coordinated on inferences that follow 
from intentions that have been explicitly agreed upon. Both tasks increases in- 
ferential difhculty to a small degree: All-Implicit agents do fairly well at making 
matched pair inferences. The Matched-Pair-Same-Room task requires the same 
inferences as the Matched-Pair-Two-Room task, but these inferences should be 
easier to make in the Matched-Pair-Same-Room since the inferential premises 
are more likely to be salient. 

The Matched-Pair tasks provide an environment for testing hypotheses A2, 
A3, A4 and A5. The Attention strategy that is used to test these hypotheses is 
the Matched-Pair-Inference-Explicit strategy; this strategy makes the premises 
for matched-pair inferences salient, thus increasing the likelihood of agents mak- 
ing matched-pair inferences. The predictions are that this strategy should be 
beneficial for low and possibly for mid AWM agents, but that high AWM agents 
can access the necessary inferential premises without Attention IRUs. Further- 
more, we predict that the beneficial effect should be stronger for the Matched- 
Pair- Two-Room task. 

Figure |2^ plots the performance differences between All-Implicit agents and 
Matched-Pair-Inference-Explicit agents for the Matched-Pair-Same- Room task. 
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Figure 22: Matched-Pair-Inference- Explicit is beneficial for LOW AWM agents in 
Matched-Pair-Same-Room. Strategy 1 is two Matched-Pair-Inference-Explicit 
agents and Strategy 2 is two All-Implicit agents, Task = Matched-Pair-Same- 
Room, commcost = 0, infcost = 0, retcost = 
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Figure 23: Matched-Pair-Inference-Explicit is beneficial for LOW, MID and high 
AWM agents in the Matched-Pair- Two-Room Task. Strategy 1 is two Matched- 

Pair-Infcrcncc-Explicit agents and Strategy 2 is two All-Implicit agents, Task = 
Matched-Pair- Two-Room, commcost = 0, infcost = 0, retcost = 



A two-way anova exploring the effect of AWM and communicative strategy in 
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Figure 24: The Matched-Pair-Inference-Explicit strategy is beneficial for LOW, 
MID and HIGH AWM agents in the Matchcd-Pair-Two-Room Task even with 
communication cost of 10. Strategy 1 is two Matched-Pair-Inference-Explicit 
agents and Strategy 2 is two All-Implicit agents, Task = Matched-Pair-Two- 
Room, commcost = 10, infcost = 0, retcost = 



this task, shows that AWM has a large effect on performance (F= 323.93, p< 
.000001). There is no main effect for communicative strategy (F = .03, ns), but 
there is an interaction between AWM and communicative choice (F= 1101.51, 
p< .000001). 

Comparisons within AWM ranges between agents using the All-Implicit strat- 
egy and agents using the Matched-Pair-Inference- Explicit strategy in the Matched- 
Pair-Same-Room task (figure ^2|) shows that Matched-Pair-Inference-Explicit 
strategy is beneficial for low AWM agents (mb(low) = 4.47, p < .05), but not 
significantly different for either mid or high AWM agents) . In the Matched-Pair- 
Same-Room task the content of the IRU was recently inferred and is likely to 
still be salient, thus the beneficial effect is relatively small and is restricted to 
very resource limited agents. 

In contrast, in the Matched-Pair-Two-Room task, the effect on performance 
of the Matched-Pair-Inference- Explicit strategy is much larger, as we predicted. 
Figure |2^ plots the mean performance differences of agents using the Matched- 
Pair-Inference-Explicit strategy and those using the All-Implicit strategy. The 
All-Implicit agents do not manage to achieve the same levels of mutual inference 
as Matched-Pair-Infercncc- Explicit agents. A two-way anova exploring the effect 
of AWM and communicative strategy in this task, shows that AWM has a large 
effect on performance (F= 171.79, p< .000001). There is a main effect for 
communicative strategy (F = 57.12, p < .001), and an interaction between 
AWM and communicative choice (F= 567.34, p< .000001). 

Comparisons within AWM ranges between agents using the All-Implicit strat- 
egy and agents using the Matched-Pair-Inference- Explicit strategy in the Matched- 
Pair- Two-Room task (figure ^3|) shows that Matched-Pair-Inference-Explicit 
strategy is beneficial for low, mid and high AWM agents (mb(low) = 21.94, p 
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< .01); mb(mid) = 7.71, p < .01); mb(high) = 38.85, p < .002). In other words, 
this strategy is highly effective in increasing the abiHty of low, mid and high 
AWM agents to make matched pair inferences in the Matched-Pair- Two-Room 
task. 

We predicted the strategy to be beneficial for low and possibly for mid AWM 
agents because it gives agents access to premises for inferences which they would 
otherwise be unable to access. This confirms the effect of the hypothesized 
DISCOURSE INFERENCE CONSTRAINT. However, we did not expect it to be 
beneficial for high AWM agents. This surprising effect is due to the fact that, 
in the case of higher AWM values, the Matched-Pair-Inference- Explicit strategy 
keeps the agents coordinated on which inference the proposing agent intended 
in a situation in which multiple inferences are possible. In other words, when 
agents have HIGH AWM they can make divergent inferences, and a strategy 
of making inferential premises salient improves agents' inferential coordination. 
Thus the strategy controls inferential processes in a way that was not predicted 
based on the corpus analysis alone. 

Hypothesis II can also be tested in this task. We can ask whether it is 
possible to drive the effort for communication high enough to make it inefficient 
to choose the Matched-Pair-Inference-Explicit strategy over All-Implicit. 

Figure |2j plots the mean performance differences between these two strate- 
gies when communication cost is high. Comparisons within each AWM range 
shows that this strategy is still beneficial for LOW, mid and high AWM agents 
even with a high communication cost (mb(low) — 19.10, p < .01); mb(mid) — 
3.94, p < .05); mb(high) = 10.46, p < .01). In other words it would be difhcult 
to find a task situation that required coordinating on inference in which this 
strategy was not beneficial. This result is strong support for the DISCOURSE in- 
ference CONSTRAINT, which may explain the prevalence of this strategy in nat- 
urally occurring dialogues |Sadock, 1978, Webber and Joshi, 1982, Cohen, 1987|, 



5.5 Zero Invalids Task 

Remember that the Zero- Invalids Task is a fault-intolerant version of the task in 
which any invalid intention invalidates the whole plan. Thus the Zero-Invalids 
task provides an environment for testing hypotheses C2 and C4 with respect to 
the inferences made explicit by the Close-Consequence strategy. 

Figure ^ plots the mean performance differences between agents using the 
Close-Consequence strategy and agents using the All-Implicit strategy in the 
Zero-Invalids task. A two-way anova exploring the effect of AWM and commu- 
nicative strategy in this task, shows that AWM has a large effect on performance 
(F= 223.14, p< .000001). There is a main effect for communicative strategy (F 
= 75.81, p < .001), and an interaction between AWM and communicative choice 
(F= 103.38, p< .000001). 

The Close-Consequence strategy was detrimental in the Standard task for 
low AWM agents. Comparisons within AWM ranges between agents using the 
All-Implicit strategy and agents using the Close-Consequence strategy in the 
Zero-Invalids task shows that there are no differences in performance for low 
AWM agents in the fault-intolerant Zero-Invalids task (mb(low) = 3.64, ns). 
However, the Close-Consequence strategy is beneficial for mid and high AWM 
agents (mb(mid) = 26.62, p < .002); mb(high) = 267.72, p < .002). In other 
words, this strategy is highly beneficial in increasing the robustness of the plan- 
ning process by decreasing the frequency with which agents make mistakes. This 
is a direct result of rehearsing the act-effect inferences, making it unlikely that 
attention-limited agents will forget these important inferences. 
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Figure 25: Close Consequence is beneficial for the Zero-Invalids Task for mid 
and HIGH AWM agents. Strategy 1 is the combination of an All-Implicit agent 
with a Close-Consequence agent and Strategy 2 is two All-Implicit agents, Task 
= Zero-Invalid, commcost = 0, infcost = 0, retcost = 



6 Discussion 

This paper showed how agents' choice in communicative action can be de- 
signed to mitigate the effect of their resource limits in the context of par- 
ticular features of a collaborative planning task. In section I presented a 
model of collaborative planning in dialogue and discussed a number of param- 
eters that can affect either the efficacy of the final plan or the efficiency of the 
collaborative planning process. Then in section ||, I presented the results of 
experiments testing hypotheses about the effects of these parameters on col- 
laborative planning dialogues. These results contribute to the development 
of the model of collaborative planning dialogue presented here. In addition, 
since the testbed implementation is compatible with many current theories, 
these results could be easily incorporated into other dialogue planning algo- 



rithms |Logan et al, 1994 [Traum, 1994 |Guinn, 1994[ |Grosz and Sidner, 1990 



Levcsquc et al., 1990, Grosz and Kraus, 1993, Chu-Carrol and Carberry, 1995], 



inter alia. 

A secondary goal of this paper was to argue for a particular methodol- 
ogy for dialogue theory development. The method was specified in section 



4.1 



The Design- World testbed was introduced in section m and sections 4.4 



and 4.5 described the parameterizations of the model that support testing the 
hypotheses. Four parameters for communicative strategies were tested: (1) 
All-Implicit; (2) Close-Consequence; (3) Explicit- Warrant; and (4) Matched- 
Pair-Inference- Explicit. Four parameters for tasks were tested: (1) Standard; 
(2) Zero-Nonmatching-Beliefs; (3) Matched-Pair (MP); (4) Zero-Invalid. Three 
situations of varying processing effort were tested. 

In this section, I will first summarize the hypotheses and the experimental 



results in section 3.1, then I will discuss how the experimental results might 
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generalize to situations not implemented in the testbed. Section xA proposes 
future work and section 3.5 consists of concluding remarks. 



6.1 Summary of Results 

The hypotheses that were generated by the statistical analysis of the dialogue 



corpora are repeated below for convenience from sections H and 4.6 



• HYPOTH-Cl: Agents produce Consequence IRUs to demonstrate that 
they made the inference that is made explicit. 

• HYP0TH-C2: Agents choose to produce Consequence IRUs to ensure 
that the other agent has access to inferrable information. 

• HYP0TH-C3: The choice to produce a Consequence IRU is directly re- 
lated to a measure of 'how hard' the inference is. 

• HYP0TH-C4: The choice to produce a Consequence IRU is directly re- 
lated to a measure of 'how important' the inference is. 

• HYP0TH-C5: The choice to produce a Consequence IRU is directly re- 
lated to the degree to which the task requires agents to be coordinated on 
the inferences that they have made. 

• HYPOTH-Al: Agents produce Attention IRUs to support the processes 
of deliberating beliefs and intentions. 

• HYP0TH-A2: There is a DISCOURSE inference constraint whose ef- 
fect is that inferences in dialogue are derived from propositions that are 
currently discourse salient (in working memory). 

• HYP0TH-A3: The choice to produce an Attention IRU is related to the 
degree of inferential complexity of a task as measured by the number of 
premises required to make task related inferences. 

• HYP0TH-A4: The choice to produce an Attention IRU is related to the 
degree to which an agent is resource limited in attentional capacity. 

• HYP0TH-A5: The choice to produce an Attention IRU is related to the 
degree to which the task requires agents to be coordinated on the infer- 
ences that they have made. 



HYPOTH-Il: Strategies that reduce collaborative effort without affecting 
quality of solution are beneficial. 



Below I will summarize the experimental results reported in section || with 
respect to the hypotheses above. 

Hypotheses C3 and C4 were tested by comparing the Close-Consequence 
strategy with the All-Implicit strategy in the Standard task. In this experi- 
mental setup, the inference made explicit by the Consequence IRU was neither 
hard to make nor critical for performance. Hypothesis C3 was only weakly 
tested by the experiments because agents always make this inference. The re- 
sults in figure |l^ show that the Close-Consequence strategy is detrimental for 
LOW AWM agents. This is because IRUs can displace useful information from 
working memory and because the inference made explicit with this IRU is not 
'hard enough'. 

The Standard task also provides a weak test of hypothesis C4. The fact 
that the Standard task is fault tolerant means that making the inference is not 
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as critical as it might be. However, errors can results from either not making 
the inference or forgetting it once it is made. At lower values of AWM, the 
probability of such errors is not that high. However, the results shown in figure 

show that the probability of error is higher for high AWM agents in this case, 
because of their belief deliberation algorithm, and thus the Close-consequence 
strategy is beneficial for high AWM agents, even in the Standard task. 

The Zero-Invalids task provides another test of hypothesis C4 by increasing 
the importance of the inference made explicit by the Close-consequence strategy. 
Figure p5| shows that hypothesis C4 is confirmed because the Close- Consequence 
strategy is beneficial for low, mid and high AWM agents. In addition to the 
reasons discussed for the Standard task, this strategy is beneficial for high AWM 
agents because they have more potential to improve their scores by ensuring that 
they don't make errors. 

The experiments did not test hypothesis CI because agents in the testbed 
are not designed to actively monitor evidence from other agents as to what 
inferences they might have made. Hypothesis C5 was not tested by the exper- 
iments because agents always rectify the situation if they detect a discrepancy 
in beliefs about act effect inferences: they reject proposals whose preconditions 
do not hold. 

Hypotheses Al, A4 and A5 were tested by experiments in which the Explicit- 
Warrant strategy was compared with the All-Implicit strategy in the Standard 
task. Hypothesis Al is disconfirmed for low AWM agents. Figure |l^ shows 
that the Explicit- Warrant strategy is neither beneficial nor detrimental for low 
AWM agents for the Standard task, when processing is free. This counterintuitive 
result arises because, when agents are highly resource limited, IRUs can displace 
other information that is more useful. 

To test hypothesis II in this situation, we also examined two situations where 
processing is not free. When communication cost dominates other processing 
costs, the Explicit Warrant strategy is detrimental for LOW and high AWM 
agents. However, when retrieval cost dominates other processing costs, the 
Explicit Warrant strategy is beneficial for mid AWM agents and there is a trend 
toward a beneficial effect for high AWM agents. Thus these two situations show 
that hypothesis II is confirmed: processing effort has a major effect on whether 
a strategy is beneficial. 

We also tested hypotheses Al, A4 and A5 with experiments in which the 
Explicit- Warrant strategy was compared with the All-Implicit strategy in the 
Zero-Nonmatching-Beliefs task (see figures and |2^) . This task increases the 
importance of making deliberation based inferences by requiring agents to be 
coordinated on these inferences in order to do well on the task. In this situation, 
we saw a very large beneficial effect for the Explicit- Warrant strategy, which 
is not diminished by increasing communication effort. Thus in situations in 
which agents are required to be coordinated on these inferences, strategies which 
include Attention IRUs can be very important. 

Hypotheses A2, A3, A4, and A5 were tested by experiments comparing the 
Matched-Pair-Inference-Explicit strategy with the All-Implicit strategy in the 
two versions of the Matched-Pair task. The results shown in figures and 
p3| provide support for these hypotheses. However these results also included 
an unpredicted benefit of Attention IRUs for inferentially complex tasks where 
agents must coordinate on inferences. Figure ^ shows that both mid and high 
AWM agents' performance improves with the Matched-Pair-Inference-Explicit 
strategy. This can be explained by the fact that Attention IRUs increase the 
likelihood that agents will make the same inference, rather than divergent 
inferences, when multiple inferences are possible. 
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Furthermore, although the Matched-Pair-Inference- ExpHcit strategy is specif- 
ically tied to Matched-Pair inferences, it provides a test of a general strategy 
for making premises for inferences salient, when tasks are inferentially com- 
plex and require agents to remain coordinated on inferences. Thus it pro- 
vides strong support for the DISCOURSE inference constraint. To gen- 
eralize this strategy to other cases of plan-related inferences, the clauses in 
the strategy plan operator that specifically refer to matched-pair inferences 
can be replaced with a more general inference, e.g. the more general (Gen- 
erates ?actl A ?act2 ?act3), where the generates relation is to be inferred 
[ Pollack, 198^ , |Grosz and Sidncr, 1990| , pi Eugenio, 1993| . 



Hypothesis II was tested by examining extremes in cost ratios for retrieval 
effort and communication effort whenever a hypothesis about the beneficial ef- 
fects of IRUs was confirmed. Figure ^ shows that high communication effort 
can make the Explicit- Warrant strategy detrimental in the Standard task. Fig- 
ure shows that high communication effort does not eliminate the benefits of 
the Explicit- Warrant strategy in the Zero-Nonmatching-Beliefs task. Figure and 
shows that high communication effort does not eliminate the benefits of the 
Matched-Pair-Inference- Explicit strategy in the Matched-Pair- Two-Room task. 
Thus the strategy of making premises for inferences salient is robust against 
extremes in processing effort. 

6.2 Generalizability of the Results 



This section addresses concerns raised in [Hanks et al, 1993 that simulation is 
' experimentation in the small'. Hanks writes that ([Hanks et al., 1993 , section 
5.1.5): 

The ultimate value- arguably the only value - of experimentation 
is to constrain or otherwise inform the designer of a system that 
solves interesting problems. In order to do so the experimenter must 
demonstrate three things: 

1. that her results - the relationships she demonstrates between 
agent characteristics and world characteristics - extend beyond 
the particular agent, world, and problem specification she stud- 
ied, 

2. that the solution to the problem area she studied in isolation 
will be applicable when that same problem area is encountered 
in a larger, more complex world, and 

3. that the relationship demonstrated experimentally actually con- 
strains or somehow guides the design of a larger more realistic 
agent. 

The list in 1 to 3 are all different ways of saying that the results should 
generalize beyond the specifics of the experiment, and this after all is a basic 
issue with all experimental work. Typically generalizations can be shown by 
a series of multiple experiments modifying multiple variables as we have done 
here. For example, the modifications to the task are specifically designed to test 
whether beneficial communicative strategies generalize across tasks. However, 
we might also ask to what extent do the variables manipulated in the simula- 
tion abstract out key properties of real situations? Below I will briefly discuss 
why the results presented above are potentially generalizable. I will focus on 
generalizations along three dimensions: (1) task (or environmental) properties; 
(2) agent architectural properties; and (3) agent behaviors. These dimensions 
are the same as those in Cohen's 'ecological triangle' [ Cohen et aL, 198St . 
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Generalizations about tasks The Design- World task was selected as a sim- 
ple planning task that requires negotiation of each step. The structure of this 
task is isomorphic to a subcomponent of many collaborative planning tasks. In 
addition, to test generalizability of hypothesized benefits across tasks, we exam- 
ined more complex variants of the task by manipulating three abstract features: 
(1) inferential complexity as measured by the number of premises required for 
making a task related inference and (2) degree of belief coordination required 
on intentions, inferences and beliefs underlying a plan; and (3) the task deter- 
minacy and fault tolerance of the plan. These general features can certainly be 
applied to other tasks in other domains. In fact it is difficult to think of a task 
or domain in which these features could not be applied. 



Generalizations about agent properties Design- World agents are artifi- 
cial agents that are designed to model the resource limited qualities of human 
agents. The planning and deliberation aspects of human processing are modeled 
with the IRMA architecture, and resource limits on these processes are mod- 
eled by extending the IRMA architecture with a model of Attention/ Working 
Memory (awm) which has been shown to model a limited but critical set of 
properties of human processing. The way that agents process dialogue is tied 
to the agent architecture. 

The experimental results will extend to dialogues between artificial agents 
to the extent that those agents exhibit similar cognitive properties. Here, we 
looked at a resource bound on access to memory as modeled by a size of memory 
subset limit, however size is directly correlated to time to access memory. Arti- 
ficial agents are often time limited in rapidly changing worlds, so it seems quite 
plausible that artificial agents would benefit from similar communicative strate- 
gies. For example, 1 would predict that agents in the Phoenix simulation testbed 
would benefit from the strategies discussed here [ Cohen et al, 1989t . In other 
work artificial agents do 'make inferences explicit' by communicating to other 
agents partial computations when the other agent might have been able to make 
these computations [Cohen et ai, 1989, Durfee et ai, 1987, Turner, 1994[. In 
addition, defining inferential complexity as a direct consequence of the num- 
ber of premises simultaneously in memory bears a strong resemblance to prob- 
lems artificia l processors have when a computation requires a large working set 
^tone, 19871 . 

The experimental results should extend to dialogues between humans and 
artificial agents because Design- World agents are designed to model humans. 
However it may be desirable to change the definition of collaborative effort for 
modeling human-computer interaction to allow the computer to handle process- 
ing that is easy for the computer to do and for the human to handle processing 
that is easy for the human to do. Furthermore, most of the claims about the 
AWM model are based on a limited set of human working memory properties, 
and these properties will also hold for other cognit ively based architectures such 
as SOAR [iLaird et ai, 198% [Lehman et ai, 1991 [. 



Generalizations about agent behaviors In this work the agent behaviors 
that were tested were the agent communication strategies. One reason to believe 
that the strategies are general to human-human discourse is that they were based 
on observed strategies in different corpora of natural collaborative planning dia- 
logues. It is possible to find all three types of IRUs in the Trains, Map- Task and 
Design corpora |Traimr, 199"lt [Carletta, 1992| [Pollack et ai, 1982[ [Whittaker et ai, 1993| , 
as well as in the financial advice domain. 
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In addition to this empirical evidence, there are further reasons why we 
might expect generahzations. 

The communicative acts and discourse acts used by Design- World agents 



are similar to those used in |Carletta, 1992, Cawsey et ai, 1992, Sidner, 1994 



Stein and Thiel, 1993 . Thus communicative strategies based on these acts 
should be implementable in any of these systems. 

The experimental results based on these strategies should generalize to other 
discourse situations because the strategies are based on general relations be- 
tween utterance acts and underlying processes, such as supporting deliberation 
and inference. For example, the mapping of a WARRANT relation between an 
act and a belief in naturally occurring examples such as ^ was modeled with a 
WARRANT relation between an act and a belief in Design- World, as seen in the 
Explicit- Warrant communication strategy. The claims made about the use of 
the Explicit- Warrant communication strategy should generalize to any dialogue 
planning domain where agents use warrants to support deliberation. 

Similarly, content based inferences in natural dialogues such as that dis- 
cussed in relation to example 
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were modeled with content based inferences in 
Design- World such as those required for doing well on the Matched-Pair tasks. 
This inferential situation was designed to test the DISCOURSE inference con- 
straint, that inferences in dialogue are restricted to premises that are cur- 
rently salient. Both experimental and corpus based evidence was provided in 
support of the discourse inference constraint. The claims made about the use of 
the Matched-Pair-Inference-Explicit communication strategy, based on experi- 
mental evidence, should generalize to any dialogue strategy where agents make 
premises for inferences available, and to any planning domain where agents are 
required to make content based inferences in support of deliberation or planning. 

The evaluation metrics applied to these strategies should also generalize 
whenever domain plan utility is a reasonable measure of the quality of solution 
for a dialogue task. 



6.3 Relation to Other Work 



previous work on cooperative dialogue [ Webber and Joshi, 1982, Pollack et ai, 1982, 


Litman, 198E 


), Pollack, 1986, 


Joshi et ai, 1986, Grosz and Sidner, 1986, 


Finin et al., 1986, 


Carberry, 1989, Clark and Schaefer, 1989, Whittaker and Stenton, 1985 


, Grosz and Sidner, 1990 


Cohen, 1987 


, and the results are applicable to other current research on collabo- 



rative planning [pidner, 1994 


, pohen and Levesque, 1991 


, Heeman and Hirst, 1995, 


Chu-Carrol and Carberry, 1995 


Guinn, 1994, 


Traum, 1994, 


Dahlback, 1991 


Lochbaum, 1994 


Grosz and Kraus, 1993, 


Young et ai, 1994 


I- 



The agent architecture and the model of deliberation and means-end rea- 
soning is based on the work of |Bratman et ai, 198i ] and [poylc, 1992|, an d 
on Pollack's Tile World simulation environment pollack and Ringuette, 199C |. 
The use of IRMA as an underlying model of intention deliberat ion to provide 
a basis for a collaborative pla nning model was first proposed in Walker, 1992 



Walker, 1993a , Walker, 1993b| , and has been incorporated into other work | Grosz and Kraus, 1995 , 



Young et al, 1994 ]. The architecture includes a specific model of limited work- 
ing memory, but most of the claims about the model are based on its recency 
and frequency properties, which might also be prov i ded by other cognit ively 
based architectures such as SOAR| Laird et ai, 1987, Lehman et ai, 1991| 
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[Walker, 1995a discusses the differences between 



a,n AWM-like attentinna. 



model and 

Grosz and S idncr's sta ck model of atten tional state |Grosz and Sidner, 198f: , 3idner, 1979, 
Grosz, 1977]. See also Rose et al, 199^ for a discussion of other discourse phenomena for 



which the AWM model may be useful. 
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Since the testbed architecture is consistent with that assumed in other work, 
the experimental results should be generalizable to those frameworks. 

The relationship between discourse acts and domain-based options and inten- 
tions in this work is based on Litman's model of discours e plans |Litman, 1985 , 
Litman and Allen, 1990 and is similar to the approach in jCarletta, 1992, Cawsey et ai, 1992 
Traum, 1994 1. The emphasis on autonomy at each stage of the planning pro- 



cess and the belief reasoning mechanism of Design- World agents is based on 
the theory of belief revision and the multi-agent simulation environment de- 



veloped in the Automated Librarian project |Gallicrs, 1989, Gallicrs, 1991a 
Cawsey et ai, 1992, Logan et ai, 1994|. 



The Design- World testbed is based on the methods used in the Tile World and 
Phoenix simulation environments: rapidly changing robot worlds in which an ar- 



tificial agent attempts to optimize reaso ning and planning [ Pollack and Ringuette, 1990 



Hanks et 



1993, Cohen et 



1989 



Tile World is a single agent world in 
which the agent interacts with its environment, rather than with another agent. 
Design- World uses similar methods to test a theory of the effect of resource 
limits on communicative behavior between two agents. 

Design- World is also based on the method used in Carlet ta's JAM simulation 
for the Edinburgh Map- Task |Power, 197^ , [Carletta, 1992| . JAM is based on 
the Map- Task Dialogue corpus, where the goal of the task is for the planning 
agent, the instructor, to instruct the reactive agent, the instructee, how to get 
from one place to another on the map. JAM focuses on efficient strategies for 
recovery from error and parametrizes agents according to their communicative 
and error recovery strategies. Given good error recovery strategies, Carletta 
argues that 'high risk' communicative strategies are more efficient, but did not 
attempt to quantify efficiency. In contrast, the approach here provides a way 
of quantifying what is an effective or efficient strategy, and the results suggest 
that a combination of the agents' resource limitations and the task definition 
determine when strategies are efficient. Future work could test Carletta's claims 
about recovery strategies within this extended framework. 

To my knowledge, none of this earlier work has considered the factors that 
affect the range of variation in communicative choice, or the effects of different 
choices, or measured how communicative choice affects the construction of a 
collaborative plan and the ability of the conversants to stay coordinated. Nor 
have other theories of collaborative planning been explicit about the agent ar- 
chitecture, or tested specific ideas about resource bounds in dialogue, and none 
have used utility as the basis for agents' communicative choice. In addition, no 
earlier work on cooperative task-oriented dialogue argued that conversational 
agents' resource limits and task complexity are major factors in determining 
effective conversational strategies in collaboration. 



6.4 Future Work 

A promising avenue for future work is to investigate beneficial strategies for 
teams of heterogeneous agents. In the experiments here, pairs of agents in 
dialogue were always parameterized with the same resource limits. Pilot studies 
of dialogues between heterogeneous agents suggest that strategies that are not 
effective for homogeneous a gents may be effective for heterogeneous ones. For 
example, in [ Walker, 1993b I tested an Attention IRU strategy in which agents 
would tell one another about all the options they knew about at the beginning 
of planning each room. This strategy is not beneficial for homogeneous agents 
because IRUs can displace other useful information. However if one agent is 
not limited, then it can be helpful for the resource limited agent to exploit the 
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capabilities of the more capable agent by telling the other agent important facts 
before it forgets them. 

Another extension would be to extend the agent communication strategies 
or to test additional ones. For example, other work proposes a number of strate- 
gies for information selection and ordering in dialogue and provides some evi- 
dence that these strategies are efficient or efficaci ous ]Carenini and Moore, 1993 , 
Suthers, 1993, Zukerman and McConachy, 1993, Chu-Carrol and Carberry, 1995|. 
Support for these claims could be provided by Design- World experiments in 
which agents used these strategies to communicate. 

Future work could also modify the properties of the world or of the task. 
For example, it would be possible to make Design- World more like Tileworld 
by making the world change in the course of the task, by adding or removing 
furniture. 

These results may also be incorporated as input into decision algorithms in 
which agents decide online which strategy to pursue, and investigate additional 
factors that determine when strategies are effective in collaborative planning 
dialogues. The results presented here show what information an agent should 
consider. For example, a comparison between LOW, MID and high AWM agents 
shows how to design decision algorithms for agents who have to decide whether 
to expend additional effort. 

Another promising avenue is make the agents capable of remembering and 
learning from past mistakes so that they can adapt their strategies to the situ- 
ation |Altcrman et ai, 1991 1. 

Finally, these results should be incorporated into the design of multi-agent 
problem-solving systems and into systems for human-computer communication, 
such as those for teaching, advice and explanation, where for example the use 
of particular strategies might be premised on the abilities of the learner or 
apprentice. 



6.5 Concluding Remarks 

The goal of this paper was to show how agents' choice in communicative action, 
their algorithms for language behavior, can be designed to mitigate the effect 
of their resource limits in the context of particular features of a collaborative 
planning task. In this paper, I first motivate a number of hypotheses based on 
a statistical analysis of natural collaborative planning dialogues. Then a func- 
tional model of collaborative planning dialogues is developed based on these 
hypotheses, including parameters that are hypothesized to affect the general- 
izability of the model. The model is then implemented in a testbed in which 
these parameters can be varied, and the hypotheses are tested. 

The method used here can be contrasted with other work on dialogue mod- 
eling. Much previous work on dialogue modeling only carries out part of the 
process described above: only the initial part of the process up to specifying a 
functional model is completed. FoUowon research that is based on these mod- 
els must judge the model according to subjective criteria such as how well it 
fits researcher's intuitions or how elegant the model is. The models developed 
here on the basis of empirical evidence can also be judged according to these 
subjective criteria, but this work carries out additional steps to further test and 
refine the model suggested by the corpus analysis. Implementing a model with 
parameters to test the generalizability of the model and testing hypotheses in 
a testbed implementation provides a way to check subjective evaluations and 
suggests many ways in which our initial hypotheses must be refined and further 
tested. 
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The Design- World testbed is the first testbed for conversational systems 
that systematically introduces several different types of independent parameters 
that are hypothesized to affect the efficacy of a collaborative plan negotiated 
through a dialogue, and the efficiency of that dialogue process. Experiments 
in the testbed examined the interaction between (1) agents' resource limits in 
attentional capacity and inferential capacity; (2) agents' choice in communica- 
tion; and (3) features of communicative tasks that affect task difficulty such as 
inferential complexity, degree of belief coordination required, and tolerance for 
errors. The results verified a number of hypotheses that depended on particu- 
lar assumptions about agents' resource limits that were not possible to test by 
corpus analysis alone. 

Several unpredicted and counterintuitive results were also demonstrated by 
the experiments. First, the task property of belief coordination in combina- 
tion with resource limits (as in the Zero-Nonmatching-Beliefs and Matched-Pair 
tasks), were shown to produce the most robust benefits for IRUs, rather than 
resource limits alone as originally hypothesized. Second, I predicted that IRUs 
would always be beneficial for LOW AWM agents, but found that IRUs can be 
detrimental for these agents through a side effect of displacing other, more use- 
ful, beliefs from working memory. Third, it would seem plausible that high 
AWM agents should always perform better than either low or MID AWM agents 
since these agents always have access to more information. However the results 
showed that there are two situations in which this is not an advantage: (1) when 
accessing information has some cost; and (2) when access to multiple beliefs can 
lead agents to make divergent inferences. In this case, restricting agents to a 
small shared working set is a natural way to limit inferential processes. This 
limit intuitively corresponds to potential benefits of limited working memory 
for humans and explains how humans manage to coordinate on inferences in 
conversation |Levinson, 1985, Grosz, 1977, Joshi, 19"78| . 

These results clearly demonstrate that factors not previously considered in 
dialogue models must be taken into account of claims if cooperativity, efficiency, 
or efficacy are to be supported. In addition, I have shown that a theory of dia- 
logue that includes a model of resource-limited processing can account for both 
the observed language behavior in human-human dialogue and the experimental 
results presented here. 
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